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1.  INTRODUCTION 


The  word  "adapt."  meaning  to  make  fit  for  a  new  situation  by  modification,  has 
appeared  in  filtering  literature  perhaps  since  1960  when  Widrow  and  Hoff  published  a 
paper  on  "adaptive  switching  circuits"  (l).  Since  then,  an  increasing  stream  of  papers 
and  books  on  adaptive  filtering  [2]  -  [42]  has  been  appearing  due  to  its  great  flexibility 
and  vast  applications.  Nowadays,  adaptive  filters  are  used  in  prediction  [7]  -  [lO], 
spectral  estimation  [lO]  -  [  12].  antenna  systems  [5],  [  12].  noise  cancellation  [6],  echo 
cancellation  [13]  -  [19].  channel  equalization  [20]  -  [26],  time  delay  estimation  [27]  - 
[29].  etc..  Its  success  is  indeed  tremendous. 

1.1  What  is  Adaptive  Filtering  ? 

"Adaptive  filters"  are  the  kind  of  filters  which  are  in  some  sense  self-designing  or 
self-modifying.  When  filtering  an  input  signal  as  a  usual  filter,  an  adaptive  filter  uses 
the  measured  input  and  output  signals  (and  usually  some  other  "reference"  signal)  to 
adjust  the  filter  characteristics  by  a  recursive  algorithm  so  that  the  filter  is  constantly 
optimized  in  some  statistical  sense  throughout  the  filtering  process.  Hence,  an  adaptive 
filter  apparently  performs  much  better  than  a  conventional  filter  for  the  same  set  of 
conditions.  This  is  why  adaptive  filters  are  so  powerful  and  have  drawn  so  much 
interest.  Although  adaptive  analog  filters  do  exist  [2].  adaptive  digital  filters  are  far 
more  popular  because  of  the  rapid  development  in  digital  computers  and  other  digital 
computing  techniques.  Throughout  this  dissertation,  only  digital  adaptive  filters  will 
be  considered  and  will  be  implied  whenever  "adaptive  filters"  are  mentioned. 


Two  subclasses  of  adaptive  filters  can  be  distinguished  analogous  to  conventional 
digital  fillers:  adaptive  finite  impulse  response  (FIR)  filters  and  adaptive  infinite  impulse 


response  (HR)  filters.  Adaptive  FIR  filters  have  been  the  major  topic  in  adaptive 
filtering  literature  because  of  the  following  two  reasons:  1)  since  they  have  no  poles,  the 
error  surfaces1  are  quadratic  and  thus  a  simple  gradient  search  works  well  for  the 
recursive  adaptation  algorithm:  2)  also  since  there  are  no  poles,  the  filter  is  always 
stable  (provided  some  convergence  conditions  in  1)  are  satisfied)  during  the  adaptation 
process.  Note  that  these  two  reasons  are  distinct,  although  related.  As  mentioned 
earlier,  the  pioneering  work  on  adaptive  (FIR)  filtering  was  done  by  Widrow  et  al.  [l]. 
Widrow  then  went  on  to  analyze  his  least  mean  square  (LMS)  algorithm  [3],  [4], 
Meanwhile,  he  proposed  a  number  of  important  areas  of  application  [4]  -  [6].  Later. 
Widrow's  LMS  method  was  extended  to  the  frequency  domain  where  an  FFT  can  be 
used  for  fast  processing  [30]  -  [32].  This  idea  was  then  further  generalized  to  so-called 
"transform  domain  LMS  algorithms"  and  was  quite  successful  [33].  At  the  same  time, 
researchers  also  tried  to  modify  Widrow's  LMS  algorithm  to  achieve  faster  convergence 
and  less  computation  [25],  [34]  -  [36].  Finally,  a  lattice  structure  for  adaptive  FIR  (also 
HR)  filters  was  also  proposed  [37]  -  [40],  Most  of  the  above-mentioned  progress  is 
fairly  well  summarized  in  two  recent  books  [41  ]  -  [42].  Indeed,  the  literature  on 
adaptive  FIR  filtering  is  vast.  On  the  contrary,  the  literature  for  adaptive  IIR  filtering  is 
much  less  extensive.  The  next  section  explores  this  in  more  detail. 

1.2  Adaptive  OR  Filtering 

The  advantages  of  infinite  impulse  response  (IIR)  filters  over  finite  impulse 
response  (FIR)  filters  are  well  known,  e.g..  for  the  same  performance.  IIR  filters  require 
much  less  computation  than  FIR  filters,  and  IIR  filters  can  usually  match  physical 
systems  well,  whereas  FIR  filters  often  give  only  rough  approximations  of  them.  This 

1  By  "error  surface,"  it  is  meant  the  mean  squared  value  of  the  output  error  (see  Figure  1 )  as  a  function 
of  adaptive  coefficients.  It  is  also  called  "performance  surface." 


is  also  true  in  the  field  of  adaptive  filtering.  However,  unlike  adaptive  FIR  filtering,  the 
error  surfaces  for  adaptive  HR  filters  may  not  be  unimodal.  and  the  poles  may  move 
outside  the  unit  circle  during  adaptation.  These  features  make  the  adaptive  1IR  filtering 
problem  much  more  difficult,  and  thus  it  was  abandoned  for  a  long  time  [43]. 

In  1976,  Feintucb  published  a  paper  on  an  adaptive  recursive  LMS  filter  [44] 
which  triggered  a  rebuttal  [43]  -  [46]  as  well  as  new  interest  in  adaptive  HR  filtering. 
Some  attempts  have  been  made  to  study  the  behavior  of  error  surfaces  for  the  "system 
identification  mode"  (Figure  1)  of  adaptive  HR  filtering.  Stearns  [47].  based  on  his 
numerical  experiments,  came  up  with  a  conjecture  that  if  the  adaptive  filter  is  of 
"sufficient  order"  (i.e..  if  na  ^ na  and  nb  ^ nb  ).  and  if  the  driving  input  x  (n  )  is  white 
noise,  then  the  error  surface  E  \e2(n  )|  is  unimodal.  where  E  {•}  denotes  the  expectation. 
Based  on  his  conjecture,  we  can  roughly  classify  the  error  surfaces  of  the  system 
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Figure  1  System  identification  mode  of  adaptive  1IR  filtering 
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identification  mode  in  a  stationary  stochastic  setting,  with  both  increasing  complexity 
and  practicality: 

1)  sufficient  order  with  white  noise  input 

2)  sufficient  order  with  colored  (noise)  input2 

3)  reduced  order  with  white  noise  input 

4)  reduced  order  with  colored  (noise)  input 

where  "reduced  order"  refers  to  the  situation  when  the  conditions  for  the  sufficient  order 
are  not  satisfied.  The  error  surfaces  for  1)  are.  based  on  Stearns'  conjecture,  unimodal 
(yet  to  be  proven).  The  other  three  classes  may  be  multimodal.  Evidence  of 
multimodality  for  case  3)  was  shown  in  [45]  and  [47],  and  for  case  2)  was  shown  in 
[48],  A  multimodal  example  of  case  4)  can  be  obtained  by  generalizing  the  example  in 
[48],  as  illustrated  in  Section  2.4.  Note  that  the  proposition  that  the  error  surfaces  of 
cases  2).  3)  and  4)  may  be  multimodal  does  not  rule  out  the  possibility  that  they  may 
be  unimodal  (even  for  na  >0).  An  example  of  a  unimodal  error  surface  for  case  3)  is 
found  in  [46].  In  general,  the  knowledge  about  these  error  surfaces  is  quite  limited. 
For  error  surfaces  involving  filter  orders  higher  than  two,  the  expressions  can  become 
very  complicated.  It  also  becomes  difficult  to  display  these  surfaces  since  they  are 
functions  of  more  than  two  variables. 

Early  work  in  adaptive  IIR  filtering  was  mainly  restricted  to  extending  Widrow's 
L\1S  method  of  adaptive  FIR  filtering  based  on  gradient  search  [43]  -  [46].  [49].  [50]. 
Typically,  the  technique  proposed  in  [49]  and  [50]  can  be  called  "Stearns'  algorithm."  As 
just  discussed,  however,  these  algorithms  may  have  a  guaranteed  global  convergence 
only  for  case  1).  This  severely  limits  their  usefulness.  It  should  be  pointed  out  here 

1  It  is  understood  that  to  ensure  possible  parameter  convergence  the  input  should  always  be  "persistently 
exciting"  [52],  [56]  -  (57). 


that  the  direct  extension  of  Widrow's  LMS  method  to  the  HR  case  results  in  the 
equation  error  approach  [5l],  which  shall  be  called  LMSEE  (least  mean  square  equation 
error).  Although  LMSEE  may  be  applicable  for  cases  1)  and  2).  it  is  well  known  that 
for  non-zero  measurement  noise  v(n  )  (see  Figure  1)  LMSEE  results  in  biased  estimates 
[51]  -[54], 

Recently  it  has  been  realized  [55]  that  adaptive  HR  filtering,  as  a  more  general 
setting  than  the  FIR,  is  closely  related  to  the  problem  of  recursive  parameter  estimation, 
a  well-developed  subject  [56],  [57],  This  broadens  the  view  of  researchers  in  signal 
processing  and  suggests  more  alternatives  for  adaptive  HR  filtering.  A  good  example  is 
given  in  [58]  and  [9]  where  the  author  adopted  a  recursive  maximum  likelihood 
estimation  algorithm  (RML2)  for  adaptive  HR  filtering  applications.  Although  RML2  is 
very  successful  in  system  identification  applications,  its  direct  use  in  adaptive  filtering 
may  not  be  very  desirable  due  to  its  computational  complexity,  vanishing  gain,  and 
potential  false  local  convergence  [57]  in  reduced  order  cases  3)  and  4). 

Another  application  of  recursive  identification  techniques  to  adaptive  HR  filtering 
is  represented  by  a  family  of  algorithms  based  on  the  concept  ol  hyperstability  [59]  - 
[66].  Among  these.  HARF  (hyperstable  adaptive  recursive  filter)  was  proven 
asymptotically  convergent  under  the  ’strict  positive  reality"  (SPR)  assumption  [60]. 
Note  that  HARF  is  potentially  applicable  to  cases  1)  and  2)  while  a  counterexample  for 
case  3)  was  reported  in  [62].  However,  the  SPR  requirement  is  a  major  obstacle  in  the 
practical  application  of  HARF  [62]  -  [66], 

Reference  [67]  gives  a  review  on  adaptive  HR  filtering  and  its  applications. 
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1.3  The  Scope 

Twenty  years  ago.  Steiglitz  and  McBride  proposed  a  scheme  for  system 
identification  [68],  Later.  Stoica  and  Soderstrom  analyzed  this  scheme  and  gave  an  "on¬ 
line"  version  of  it  [69],  [70].  In  this  dissertation,  a  family  of  stochastic  approximation 
variants  of  the  Steiglitz-McBride  scheme  is  developed  which  shows  a  close  relationship 
with  LMSEH.  Aiming  directly  at  filtering  applications,  the  contributions  of  this  work 
are:  i)  the  proposed  algorithms  have  a  non-vanishing  gain  which  enables  the  algorithms 
to  remain  active  and  to  track  time-varying  systems:  ii)  the  algorithms  are  "on-line"  and 
are  computationally  simple;  and  iii)  an  interesting  phenomenon  -  global  convergence 
regardless  of  local  minima  is  observed  which  makes  the  algorithms  promising  for  more 
general  error  surface  cases  3)  and  4).  The  asymptotic  behavior  of  the  proposed 
algorithms  is  similar  to  that  of  the  Steiglitz-McBride  scheme  as  analyzed  in  [69]  and 
[70].  The  major  drawback  is  that  the  estimates  are  biased  in  the  presence  of  colored 
measurement  noise  (disturbance).  However,  the  global  convergence  phenomenon 
motivates  the  study  of  this  family  of  algorithms.  Note  that  this  phenomenon  has  not 
been  observed  in  any  other  adaptive  HR  filtering  algorithms  (see.  e.g.,  [45],  [62]  and 
[64]). 

Despite  the  similarity  in  asymptotic  behavior  between  the  proposed  algorithms 
and  the  Steiglitz-McBride  algorithm,  the  convergence  mechanism  of  the  proposed 
algorithms  is  quite  different  from  the  latter  because  of  the  non-vanishing  gain.  Due  to 
the  lack  of  appropriate  mathematical  tools,  this  most  practical  category  of  adaptive  1IR 
filtering  problem,  non-vanishing  gain  in  a  stochastic  environment,  has  not  received 
significant  study  [67].  Only  very  recently,  a  series  of  papers,  e  g.  [71]  -  [74],  on  "weak 
convergence"  of  adaptive  algorithms  for  constant  gains  has  provided  us  with  such  a  tool. 
In  this  dissertation,  we  will  use  the  results  of  Benveniste  et  al  [71]  to  prove 
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convergence  of  the  proposed  algorithm  since  the  conditions  in  [71]  are  more  "tractable" 
than  those  in  others.  Specifically.  Benveniste's  results  will  be  used  to  translate  the 
original  problem  into  the  study  of  an  associated  ordinary  differential  equation  (ODE) 
regardless  of  the  orders.  The  study  of  the  ODE  for  sufficient  order  cases  l)  and  2)  then 
results  in  parameter  convergence  and  output  error  convergence.  This  material  is  all 
covered  in  Section  2.3  (also  see  [75])  following  the  development  of  the  algorithms  in 
Section  2.2  (see  [76].  [77]).  The  results  are  verified  by  computer  simulations  in  Section 
2.4.  For  the  reduced  order  cases  3)  and  4).  the  ODE  is  much  more  difficult  to  analyze. 
However,  computer  simulations  are  presented  in  Section  2.4  which  suggest  global 
convergence  for  these  cases.  A  discussion  on  convergence  rate  and  other  related  issues  is 
contained  in  Section  2.5.  Chapter  3  is  devoted  to  the  application  in  echo  cancellation.  A 
number  of  computer  simulations  are  presented  for  various  situations.  Finally.  Chapter 
4  concludes  the  dissertation  by  discussing  some  remaining  issues  and  open  questions  in 
adaptive  HR  filtering. 


2.  THREE  NEW  ADAPTIVE  HR  FILTERING  ALGORITHMS 


2.1  Problem  Formulation 


The  system  identification  mode  of  adaptive  HR  filtering  shown  in  Figure  1  is 
described  by: 


w(n  )=£a;  w(n  —  i  )+Y.bjX  ( n  —j  ) 
i  =1  j  =o 


y  (n  )=w  (n  )+v  (rc  ) 


y  (n  )=  (n  )y  (n  —  i  )+  ( n  )x  (n  —j  ) 

i  =1  j  =o 


e  ( n  )— y  (n  )-y  (/i ),  (id) 

where  it  is  assumed  that  all  the  poles  of  the  dynamic  plant  are  inside  of  the  unit  circle, 
and  the  numerator  and  the  denominator  of  its  transfer  function  are  relatively  prime.  If 
all  a;(n)' s  are  forced  to  be  zero,  i.e..  no  poles  are  involved  in  the  adaptive  filter,  we 
would  have  an  adaptive  FIR  filter.  In  this  case  it  is  easily  seen  that  the  mean  square 
error  is  quadratic  in  the  b,  (n  )'s: 


E  je2(n  )}=£■  |[y  (n  )—  £bj  ( n  )x  (n  —j  )]2) 

j  *o 

so  that  the  gradient  is  linear  in  bj  (n  )’s: 

Vj  (n  jl  E I e2(n  )}  )  =  — — -E  \e2(n  )}=-2 E  [e  ( n  )x(n -j  )}. 
'  dbjin) 


Widrow's  L\1S  algorithm  is  then 


bj  (n  +  !)=£,  (n  )— T’V*;(n  >1  E  |e2(n  )}  }=£,  (n  )+2r'£  \e  ( n  )x  (n  —j  )} 
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*=6;  (n  )+re  (n  )x  (n  —  y  ).  (4) 

It  was  proved  that  the  filter  weights  bj(n )  will  converge  to  their  optimal  Wiener 
solution  [3].  [4].  However,  due  to  the  structure  of  the  filter,  this  optimal  solution  may 
still  yield  a  significant  mean  square  error  if  the  number  of  weights  is  not  large  enough. 
This  motivates  the  study  of  the  adaptive  HR  filtering,  i.e..  allowing  the  freedom  in 
adjusting  a,  (n  )'s. 

The  problem  of  adaptive  HR  filtering  is  not  as  easy  as  that  of  FIR.  The  difficulty 
arises  from  the  recursions  in  y .  Some  existing  algorithms  mentioned  in  Chapter  1  are 
given  here. 

A)  Stearns'  (Recursive  Gradient)  Algorithm  (46] ,  (49]  -  (50/,  [64] ,  (67]: 

a,  (n+l)=ai(n)+re(n)y,  Xn):  *=1.2,  •  ■  sia  (5a) 

bj  (n  +1)=6;  (n  )+re(n  )x/(n  );  j  =0.1.  •  •  •  jtb  (5b) 

where 

yt  '(ra  )=y  (n  —  i  )+  (n  )y,  "(n  —  l):  i  =  1 .2.  •  •  •  Aa  (5c) 

/ =i 

"j 

Xj'(n)=x(n—j)+'£tal(n)xJ'(n—l);  j  =0.1.  •  •  •  Jib  .  (5d) 

/  =i 

Note  that  the  quantities  corresponding  to  the  gradient  in  (4),  i.e..  y,  Xn  )  and  x,  Xn  )  are 
computed  recursively  by  (5c)  and  (5d).  As  shown  by  the  analysis  in  [64],  [67]  and  by 
numerical  examples  in  [46].  this  algorithm  is  indeed  a  gradient  algorithm  and  converges 
only  to  a  local  minimum.  Thus,  it  may  only  work  well  for  the  error  surface  case  1). 

In  computing  y/(n)'s  and  Xj  ‘(n  )'s  as  given  in  (5c)  and  (5d),  (n„  +nA  +l)xna 
storage  space  is  required.  Since  y,  Xn )  and  x,  '{n )  are  updated  for  each  i  and  j 
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r> 


independently,  it  requires  (n0 +nb +\)x.na  multiplications  at  each  step  n.  Johnson 
mentioned  in  [67]  a  simplified  version  of  this  algorithm: 


a,  (n  +1  )=ai(n  )+re  (n  )y '(n  —i ):  i  =1.2.  •  •  .n3 


(6a) 


bj(n  +1  )=bj  (n  )+re  (n  )x  '(n  —j  ):  j  =0.1.  ■  Ab 


(6b) 


where 


y  '(n  )=y  (n  )+  £a;  (n  )y  '(n  — Z  ) 

/  =i 


(6c) 


x  ‘(n  )=x  (n  )+  £a{  (n  )x  '(n  — Z  ). 
/ 


(6d) 


In  other  words,  time  shifted  versions  of  y'Cn )  and  x  '(n )  are  used  instead  of 
independent  y/  '(n  )'s  and  '(n  )‘s.  In  comparison  with  (5c)  and  (5d).  (6c)  and  (6d) 
require  only  2na  storage  spaces  and  2na  multiplications  at  each  step.  For  slow 
adaptation  (small  t).  the  results  using  these  two  versions  are  essentially  interchangeable 
[67],  The  Stearns'  algorithm  can  be  thought  of  as  a  direct  extension  of  Widrow's  LMS 
algorithm  in  that  if  na  —0.  then  (5)  and  (6)  simplify  to  (4).  Another  important  point  is 
that  the  algorithm  cannot  guarantee  the  stability  of  the  filter  during  adaptation.  Thus, 
an  expensive  stability  monitoring  device  needs  to  be  incorporated  into  the  algorithm. 


B I  Simple  Hyperstable  Adaptive  Recursive  Filter  (SHARP)  f  59/  -  /67/: 

The  algorithm  was  proposed  as  a  simplified  version  of  the  hyperstable  adaptive 


recursive  filter  (HARF)  [60],  For  na  -na  .  nb  =nb  ,  and  v  (n  )=0.  the  SHARF  is  described 
as  follows: 


a,  ( n  +l)=a,  (n  )+re  '(/i  )y  (n  —  i  ):  i  =1.2.  Aa 


(7a) 


bj  (n  +1)=6,  (n  )+re  ’(n  )x  (n  —j  ):  j  =0.1.  ■  ■  Ab 


(7b) 
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e'(n)=e(n)+Y0cle(n—l)  (7c) 

1=1 

where  the  coefficients  ct ‘s  are  chosen  so  that  the  strictly  positive  realness  (SPR) 
condition 


1  +  Zc,:'' 


Re(/f(z)}>0,  for  all  I r  1=1;  H(z)=- 


l-Y.a,z  1 


is  satisfied.  While  SHARF  is  capable  of  dealing  with  error  surface  cases  1)  and  2)  and 
has  guaranteed  stability,  the  SPR  requirement  is  an  obstacle,  since  some  a  priori 
knowledge  about  the  unknown  plant  parameters  a t  s  is  needed.  After  exchanging  ideas 
through  extensive  correspondence  in  journals  and  conference  proceedings  [62]  -  [66], 
researchers  came  up  with  a  modified  SHARF  algorithm  that  eliminates  the  SPR 
requirement  as  well  as.  unfortunately,  the  guaranteed  stability  [63].  [66]: 

a,  (n  +  l)=a,  (n  )+re  '(n  )y  (n  —  i ):  t  =1,2.  •  ,na  (9a) 


bj  (n  +1  )=bj  (n  )+re  (n  )x  (n  —j  ):  j  =0,1,  ■  jib 


ck  (n  +1  )=ct  (n  )+re  *(n  )e  ( n  —k. );  k  - 1.2,  ••  •  Aa 


e  '( n  )-e  (n  )+  £c,  in  )e  (n  —  l ) 


provided  that 


C  (n  ,z  -1)=!  +  Y.Ci  (n  )r  '  has  all  roots  within  the  unit  circle  (10) 

i  =i 

during  adaptation,  which  requires  also  a  stability  monitoring  device  [66].  [67]. 


>  *  W  V 
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Note  that  a  rigorous  convergence  proof  applies  only  to  HARF.  which  is 
computationally  much  more  complicated  [60].  The  SPR  requirement  (8)  and  the 
stability  requirement  (10)  were  imposed  on  HARF.  Nevertheless,  for  small  r  SHARF 
remains  a  close  approximation  to  HARF.  and  thus  these  requirements  are  essentially 
applicable  to  SHARF  [59]. 

For  reduced  order  cases  3)  and  4),  SHARF  (or  HARF)  fails  to  converge  to  the 
desired  global  minimum  as  demonstrated  in  [62]. 

C)  Least  Mean  Square  Equation  Error  ( LMSEE )  Algorithm  [51 1  -  [ 54/ 

Another  way  of  directly  extending  Widrow’s  LMS  algorithm  to  the  HR  case  is  by 
filtering  the  output  error  through  an  all-zero  (post)  filter  identical  to  the  denominator 
of  the  adaptive  filter,  so  that  this  filtered  output  error,  or  the  "equation  error."  is  linear 
in  a,  (n  )  s  and  b]  (n  )‘s  [52]: 

e  '(n  )=e  (n  )—£a,  (n  )e  ( n  —i  ). 

i  =1 

=y  ( n  )—  £dj  (n  )y  (n  —i )— [y  (n  )—J~a,  ( n  )y(n  -i  )] 

i=l  i=l 

—y  ( n  (n  )y  (n  —i  )— £6;  (n  )x  (n  —j  ).  (11) 

i  =1  j=  o 

It  is  then  a  simple  matter  to  minimize  E  je  ,2(n  )}  by  the  widely  used  LMS  method,  i.e.. 

— -E{e'2(n  )l=2e'(n  )— -e '(n  )=— 2e'(n  )y(n— i  ):  t=1.2.  •  ■  jia  (12a) 

da,  (n)  Qa,  (n  ) 

— ^ — -E  ( e  '2(n  )|=2e  '(n  )— ^ — 2e  '{n  )x  (n  —j  ):  j  =0.1.  •  •  ■  Ab  (12b) 
such  that 
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a,  (n  +  l)=a,  (n  )+re  *(n  )>'  (n  —i  ):  i=1.2.  •  ■  ,rca  (13a) 

bj(n+\)=bj(n)+TeXn)x(n—j)-.  j  =0.1.  ■■■  sib  .  (13b) 

This  approach  minimizes  the  (squared)  equation  error  but  does  not  necessarily  minimize 
the  (squared)  output  error  unless  the  minimum  values  are  zero  (one  being  zero  implies 

"j 

the  other  as  clearly  seen  in  (11).  assuming  1—  £a,(n  )z~‘  having  all  its  roots  within 

i  =i 

the  unit  circle),  which  corresponds  to  the  error  surface  cases  1)  and  2)  when  the 
disturbance  v  (n  )=0.  For  non-zero  v  (n  )  the  parameter  estimates  are  generally  biased 
[5 1  ]  -  [54].  Debiasing  requires  a  priori  knowledge  about  the  statistics  of  v  (n  )  and  thus 
may  not  be  desirable  [5l]. 

Other  adaptive  HR  filtering  algorithms  also  exist,  e.g.,  RML2  adopted  by 
Friedlander  [58],  However,  as  mentioned  before,  due  to  the  computational  complexity 
and  other  reasons,  they  are  not  presented  in  this  dissertation. 

2.2  Development  of  the  New  Algorithms 

The  family  of  the  new  algorithms  is  developed  cased  on  the  LMSEE  algorithm. 
Consider  for  the  moment  x'  (n  )  as  the  input,  e'  ( n  )  as  the  output  error,  and  v'  (rt )  as 
the  disturbance  in  Figure  2  (this  subsystem  is  the  same  as  Figure  1).  e  (n  )  then  becomes 
the  "equation  error"  as  described  in  Section  2.1  and  can  be  expressed  as 

"j 

e  (n  )=e‘  (n  )—  £a,  (n  )e’  (n  —  i  ).  (14) 

i=i 

where 

e' (n  )=y' (n  )—y  Xn  )  (15a) 
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Figure  2  System  identification  mode  of  the  proposed  algorithm 


'Ij 

w  ( n  )=  vv  (n  ~i  )+  T.bj  x‘  ~j  ) 

<*>  /  =n 


y’  (n  )=w'  (n  )+v’  (n  ) 


J 

v'  (n  )=v  (n  )+  £a  (n  )v"  (n  — t  ) 

■  =i 


>*  ’  (n  )=  £a,  (n  )v  1  (n  —i  )+  (n  )x'  (n  —j  ). 
;  =  1  ;  =•* 

analogous  to  ( 1 ).  Substituting  ( 15a)  and  ( I5e)  into  (14)  results  in 
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=y'  (n  )—  £a,  (n  )>•'  (rc  — t  )— £6,  (n  )x"  (n  —j  ). 
i=l  _/  =o 


(16) 


Here  e  (n  )  is  linear  in  a,  (n  )'s  and  b,  (n  )‘s  assuming  y'  and  x'  are  independent  of  these 
adaptive  coefficients  for  the  moment.  Then  minimization  of  E  |e2(n  )}  yields 


— U2(n  )}=:2e  (n  ) — - e  (n  )=— 2e  (n  )y’  (n  —  t  ): 

0a,(*)  8a.(«) 


7-4 — r-E  l«2(n  )}^*2e  (n  ) — — -e  (n  )=— 2e  (n  )x*  (n  —j  ): 
do,  (n  )  dbj(n) 


i=l,2.  (i7a) 

y  =0,1,  ■  (17b) 


such  that 


a*  (n  +  l)=a,  (n  )+re  (n  )y‘  (n  —  i  );  t  =1,2,  •  •  •  Aa  (18a) 

bj  (n  +l)=6y  (n  )+re(n  )x'  (n—  /'  );  j  =0,1 .  ■  ■  ■  ,nb  .  (18b) 

As  stated  before,  this  approach  minimizes  the  (squared)  equation  error  but  not  the 
(squared)  output  error.  To  circumvent  this,  an  all-pole  filter  which  is  the  inverse  of  the 
post-filter  is  added  as  a  pre-filter  at  the  input,  as  shown  in  Figure  2.  Mote  that  the 
overall  response  from  x  (n  )  to  e  (n  )  is  not  the  same  as  that  of  Figure  1.  due  to  the 
time-varying  nature  of  the  filters.  However,  for  slow  adaptation  it  is  a  close 

approximation  to  that  of  Figure  1.  Moreover,  at  a  convergence  point  where  all  adaptive 

coefficients  tend  to  be  constant,  these  two  systems  become  exactly  the  same.  Thus  e  ( n  ). 
the  equation  error  with  respect  to  x'  ( n  ).  can  be  thought  of  as  the  output  error  for  the 
input  x(n).  Then  the  quadratic  minimization  scheme  for  the  equation  error  ((14)  - 
(18))  can  be  applied  provided  x'  (n  )  and  y'  (n  )  are  independent  of  a,  ( n  )  s  and  b,  ( n  )’.;. 
Due  to  the  pre-filter,  this  is  apparently  not  true.  However,  we  observe  that  Equation 
(17b)  is  valid,  since  x'(n  )  and  y'  (n  )  are  independent  of  b,{n  )’s.  and  the  derivative  of 
e  (n  )  with  respect  to  a,  (n  ).  considering  (15b).  (15c).  ( I5d)  and  (16),  is 
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- - - e  (n  )=— ^ — -[w'  ( n  )+v'  ( n  )]—  Z°i  in  — -v'  ( n  —l  )—y‘  (n  —i ) 

0a,  (n)  0a,  (n)  /=i  0a,  (n  ) 


-Zhin)^ . -x'(n-j) 

j  =o  0a,  ( n  ) 


=  Zai  —r T~TW' (n  ~l  )+  Z  *>y  Tj4 — r*‘  («  "7  )+  Z5/  («  )T^T_TV’’  (n  ) 

/=i  0a;  (n  )  j  =o  Qa,  (n)  /=i  0a*  (n ) 


+v'  (n  —  t  )—  £a7  (n  )— — t-I**'  (n  - £  )+v*  (n  —Z )]—  Z  h  ( n  )— s4 — («  ~j  ) 

/j=i  0Oj  (n )  j  *o  0a*  (n  ) 


— y'  (n  — t  ) 


=  21  [«2/  — <2/  (^T. )]  --y-  -  w‘  (n  — Z  )+Zfy"f/(n  )]-r^~7"  \ x  ~j  ^~w  (n  ) 
/si  0Oj  (n  )  /  *o  Qa,  (n  ) 


s=— w'  (n  — i  );  t  =1.2.  ■  •  ,na  . 


where  n  1=max{na  .  na  )  and  n2=max{n4  .  nb  }.  From  this  it  is  seen  that  Equation  (17a) 
is  an  approximation  in  that  w*  is  replaced  by  y'  and  the  two  summation  terms  are 
ignored  in  the  last  step  of  (19).  Note  that  this  is  a  reasonable  approximation  for  the 
following  reasons.  First,  because  v  (n  )  [and  hence  v*  ( n  )]  is  unmeasurable.  >•'  (n  —  j  )  is 
the  only  measurable  quantity  that  gives  an  approximation  for  w'  (n—j  ).  For  v(n  )=0 
this  approximation  becomes  exact.  Second,  for  the  sufficient  order  case  na  =na  .  nb  -nh . 
the  two  summation  terms  in  (19)  indeed  approach  zero  ntai  the  (parameter) 
convergence  point  and  hence  resulting  in  a  gradient  nature  at  that  point  which 
guarantees  the  local  convergence  at  that  (global  minimum)  point.  Whereas  at  other 
points  (possibly  other  local  minima)  the  two  ignored  summation  terms  are  not  zero,  and 
hence  the  algorithm  does  not  exhibit  a  gradient  behavior,  which  may  prevent  itself 
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-y'  (n  )—]Ta,  (n  )>•'  ( n  —i  )—  ( n  )x‘  (n  —j  ).  1,16) 

••=1  J  =o 

Here  e  (n  )  is  linear  in  a,  (n  )'s  and  5,  (n  )’s  assuming  y’  and  x‘  are  independent  of  these 
adaptive  coefficients  for  the  moment.  Then  minimization  of  E  le2(n  ))  yields 

— ^ — -E  {e2(n  )}»=2e  (n  )—J~ — -e  (n  )=—2e  (n  )y'  (n  —  i  );  i=  1.2.  ■  jia  (17a) 

da,  in)  fta,  (n  ) 

- — -E{e2(n  )}=2e(n  ) - ^ — -e  (n)=-2e  (n)x‘ (n -j  ):  j=0.1.  ■  .nb  (17b) 

dbj(n)  3*;(n) 

such  that 


a,  (n  +1  )=a(  (n  )+re  (n  )y‘  (rt  —i  ): 

i  =1.2.  ■  •  •  na 

(18a) 

bj  (n  +1  )-bj  (n  )+re  (n  )x'  (n  —j  ); 

j  *0,1.  ■  -  -  Ab . 

(18b) 

As  stated  before,  this  approach  minimizes  the  (squared)  equation  error  but  not  the 
(squared)  output  error.  To  circumvent  this,  an  all-pole  filler  which  is  the  inverse  of  the 
post-filter  is  added  as  a  pre-filter  at  the  input,  as  shown  in  Figure  2.  \ote  that  the 
overall  response  from  x(n)  to  e{n  )  is  not  the  same  as  that  of  Figure  1.  due  to  the 
time-varying  nature  of  the  filters.  However,  for  slow  adaptation  it  is  a  close 
approximation  to  that  of  Figure  1.  Moreover,  at  a  convergence  point  where  all  adaptive 
coefficients  tend  to  be  constant,  these  two  systems  become  exactly  the  same.  Thus  e  (n  ). 
the  equation  error  with  respect  to  x'  (n  ).  can  be  thought  of  as  the  output  error  for  the 
input  x(n ).  Then  the  quadratic  minimization  scheme  for  the  equation  error  ((14)  - 
(18))  can  be  applied  provided  x'  (n  )  and  >■'  (n  )  are  independent  of  a,  (n  )  s  and  b,  (n  )’s. 
Due  to  the  pre-filter,  this  is  apparently  not  true.  However,  we  observe  that  Equation 
(17b)  is  valid,  since  x‘  ( n  )  and  y‘  ( n  )  are  independent  of  b,  (n  )'s.  and  the  derivative  of 
e  (n  )  with  respect  to  a,  (n  ).  considering  ( 1 5b).  ( 15c).  ( 15d)  and  (16).  is 


__2 — _e  (n  )=— ~ — -[w'  (n  )+v‘  ( n  )]—  £a,  ( n  )— 1-7 — rv'  ^ n  ) 

9a,  (n)  Qa,  (n)  ,=i  6fli(n) 


-  £fr,  (n  )-^ — (n -j  ) 

,  =o  ‘  da,{n) 


=  Za/  —--7  Tw' (n  “* )+  Z  TTT^  (*  -y  )+  Z<*/  (n  )-rr 7 -Tv'  (n  -l ) 

/=i  d°i  (n  )  y=o  9°,  (a  )  /=i  Qa,  (n) 


+v*  (n  — t  )—  ]Ta;  (n  )— =-7 — («  —  Z  )+v‘  (n  — Z  )]—£&,  (n  )— ^ (n  —j  ) 
/  =1  Qa;  (n  )  y=o  9a,  (n ) 


— y‘  (n  — *  ) 


=  Z[fl/  («  )]— -w‘  (n  -Z )+  £  [fry  (*  )]  -■.  /  r-*'  (n  ~j  )~w'  (n.  -i ) 

9a,  (n  ) 


9a,  (n  ) 


=— w'  (n  — i  ):  i  =1.2.  • 


where  rii=max{na  .  na  }  and  n  2=max|/t6  .  nb  }.  From  this  it  is  seen  that  Equation  (17a) 
is  an  approximation  in  that  w'  is  replaced  by  y'  and  the  two  summation  terms  are 
ignored  in  the  last  step  of  (19).  Note  that  this  is  a  reasonable  approximation  for  the 
following  reasons.  First,  because  v  (n  )  [and  hence  v’  (n  )]  is  unmeasurable,  y'  ( n  —  j  )  is 
the  only  measurable  quantity  that  gives  an  approximation  for  w'  ( n—j  ).  For  v(n  )=0 
this  approximation  becomes  exact.  Second,  for  the  sufficient  order  case  na  =n0 .  nb  =nf, . 
the  two  summation  terms  in  (19)  indeed  approach  zero  near  the  (parameter) 
convergence  point  and  hence  resulting  in  a  gradient  nature  at  that  point  which 
guarantees  the  local  convergence  at  that  (global  minimum)  point.  Whereas  at  other 
points  (possibly  other  local  minima)  the  two  ignored  summation  terms  are  not  zero,  and 
hence  the  algorithm  does  not  exhibit  a  gradient  behavior,  which  may  prevent  itself 


from  being  trapped  at  a  local  minimum  as  other  algorithms  are.  For  reduced  order  cases 
the  situation  is  more  complicated  and  it  may  also  very  well  be  these  ignored  terms  that 
make  the  algorithm  converge  globally  (see  Section  2.4).  Another  more  heuristic 
argument  for  this  approximation  may  be  as  follows.  At  each  step  of  adaptation,  the 
input  x  in  )  is  pre-filtered,  producing  x  in  ).  before  the  adaptive  coefficients  are  updated. 
It  is  thus  assumed  that  the  pre-filter  is  a  constant  filter  with  coefficients  fixed  at  a,  ( n  )'s 
at  each  step  of  adaptation  n  .  Then  x  in  )  and  y  (n  )  can  be  thought  of  as  independent 
of  the  coefficients  a,  (n  )‘s  and  b,  in  )’s  in  the  adaptive  filter  and  the  post-filter  at  the 
time  instant  n  .  The  quadratic  scheme  (14)  -  (18)  can  still  be  applied.  After  updating, 
the  coefficients  in  the  pre-filter  are  also  changed,  resulting  in  a  new  fixed  pre-filter  for 
the  next  adaptation.  This  argument  is  more  valid  for  the  "IF"  algorithm  appearing  next. 
In  short.  Eqns.  (14).  (15).  and  (18)  together  with 

x'  (n  )=x  in  )+£a,  in  )x‘  in  —  i  )  (20) 

.=i 

constitute  one  version  (system  identification  mode)  of  the  proposed  algorithm.  It  shall 
be  referred  to  as  SIM  algorithm  in  the  sequel. 

Note  that  since  v(n  )  is  not  measurable.  (I5d)  is  not  practically  realizable.  Thus 
SIM  algorithm  is  applicable  only  when  v(n  )=0.  Also,  in  the  general  adaptive  filtering 
mode,  yin)  is  usually  the  given  data  ("desired  response")  so  that  y'(n)  cannot  be 
obtained  by  feeding  x'(n)  into  the  (unknown)  dynamic  plant  and  measuring  the 
output,  as  described  by  (15b)  -  (15d).  In  this  case,  another  version  of  the  proposed 
algorithm  can  be  comprised  of  Equations  ( 14).  ( 15a).  ( I5e).  (18).  (20)  and 

y'  ( n  )=y  (n  )+  in  )y'  (n  — t  ).  (21 ) 

i  =i 

where  yin  )  is  the  given  data  which  can.  for  example,  be  modeled  by  (la)  and  (lb). 


IS 
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Note  that  the  two  pre-filters  ((20)  and  (21))  are  necessary  to  obtain  x'  (n  )  and  y'  ( n  )  to 
update  the  coefficients.  However,  it  is  not  necessary  to  calculate  e  ( n  )  using  (14).  (15a). 
(I5e),  (20).  and  (21).  In  fact,  the  pre-filter  and  the  post-filter  approximately  cancel  as 
argued  before  and  thus  e(n)  can  be  calculated  directly  from  (id).  Therefore,  the 
adaptive  filtering  mode  of  the  proposed  algorithm  (AFM  algorithm)  is  described  by 
Equations  (lc).  (id).  (18).  (20).  and  (21).  Figure  3  shows  the  implementation.  Note 
that  besides  the  adaptive  filter  and  the  coefficient  updating  mechanism  (18).  only  two 
additional  filters  are  used.  Thus,  it  is  computationally  simple  and  efficient. 

Analogous  to  (5)  and  (6).  as  in  Stearns'  algorithm,  a  somewhat  more  complicated 
version  of  the  proposed  algorithm  can  be  obtained  as  follows.  Instead  of  using  a  time- 
shifted  version  of  y'  and  x‘.  i.e..  y'(n-i)  and  x'(n-j)  as  in  (18).  a  set  of 
independently  filtered  quantities,  v,  ‘(n  )  and  x; *( n  )  is  used.  They  are  obtained  by 


f. 
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v,  '(n  )=y  (n  —i  )+  £a,  (n  )y,  ‘(n  —l  );  £=1.2.  ••-.na  (22a) 

/  =i 

"j 

*,  '(n  )=x  (n  —j  )+  ^a,  (n  )xj  '(n  —  l):  j  =0.1.  •  •  •  ,nb  .  (22b) 

/=i 

This  can  be  applied  to  either  SIM  or  AFM.  Since  AFM  algorithm  is  more  practical  and 
can  also  be  used  for  the  system  identification  purpose,  only  an  AFM  version  of  this 
independent  filtering  (IF)  algorithm  is  considered.  As  in  the  Stearns'  algorithm,  the 
storage  space  and  the  number  of  multiplications  at  each  step  will  increase  from  2 na 
(AFM)  to  ( na  +nb  +l)xna  (IF).  Although  the  IF  algorithm  does  not  seem  to  be 
computationally  more  advantageous  than  the  others,  it  allows  the  development  of  a 
rigorous  convergence  proof  presented  in  the  next  section,  which  will  guide  the  use  of  the 
AFM  algorithm,  since  it  is  a  close  approximation  of  the  IF  algorithm  for  slow 
adaptation.  One  can  easily  see  that  if  a ,  ( n  )'s  are  not  time-varying,  then  (22)  is  the 

same  as  the  time-shifted  versions  of  (20)  and  (21).  Although  theoretically  these  three 

algorithms  are  all  different  due  to  the  time-varying  nature  of  the  filters,  asymptotically 
they  behave  the  same  because  as  the  filter  coefficients  converge,  these  filters  become 
more  time-invariant  (see  Section  2.3).  Practically,  for  slow  adaptation  (usually  the 
case)  even  at  points  other  than  the  convergent  ones,  these  algorithms  still  behave  almost 
the  same.  Our  computer  simulations  show  that  the  convergence  paths  and  mean  square 
error  progressions  using  these  three  algorithms  are  almost  indistinguishable  (see  Section 
2.4). 

Finally,  all  the  three  proposed  algorithms  are  summarized  below. 

A  >  SIM  Algorithm  (Figure  2): 

a,(n +l)=a,(n  )+re(n  )v'(n -i  ):  f  =1 .2,  •  •  •  ,n„  (23a) 
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bj  (re  +  1)=£;  ( re  )+re  ( re  )x  '(re  —j  );  j  =0.1,  ■  ■  .hb 


x  '(re  )=x  (re  )+  ( re  )x  '( re  —i  ) 

i  =1 


e  ( re  )—e  '(re  )—  £a,  ( re  )e  '(re  —i  ) 

i=i 


e  '(re  )-y  '(re  )—y  '(re  ) 


w  *(n  )=  £a,  w  '(re  —  i )+  £  x  '(re  —j  ) 

■  =1  j=0 


y  '(re  )=w  '(n  )+v  '(re  ) 


v '( re  )=v  (re  )+  £a,  (re  )v  '(re  —i  ) 

i  =i 


y  '(n  )=£a,  (re  )y  '(re  — t  )+  (re  )x'(re  —  j  ) 
i  =1  ]  =o 


5)  AFM  Algorithm  (Figure  3): 

ctiin  +1)=^  (re  )+re(re  )y  '(re  —  i  );  i  =1.2.  ■  sia 

bj  (re  +  l)=ty  (re  )+re  (re  )x  '(re  —  j  ):  j  =0.1.  •  •  ■  ,hb 


y  '(re  )=y  (re  )+  £a,  (re  )y  '(re  — i  ) 

i  =1 


x  '(re  )=x  (re  )+  £a,  (re  )x  '(re  —  i ) 

i  =i 


•/C  -cT  vr.  -A 


(23b) 

9 

(23c) 

8 

(23d) 

l 

l 

(23e) 

(23f) 

i 

(23g) 

s 

■ft 

(23h) 

1 

»> 

>* 

(23i) 

V 

§ 

(24a) 

•v 

(24b) 

■  J 

is 

(24c) 

ri* 

£ 

(24d) 
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e  in  )=y  in  )—y  in  ) 

(24e) 

with 

yin  )=£a,  in  )yin  —i )+  £  bj  in  )x  (n  —j  ) 

<=1  j  =0 

(24f) 

C)  IF  Algorithm: 

dL  in  +l)=a,  in  )+re  (n  )y,  'in  );  i  =12.  ■  ■  •  A, 

(25a) 

bjin  +1  )~bj  in  )+re  (n  )xj  'in  ):  j  =0.1.  •  Ab 

(25b) 

"j 

yi’in  )=y  in—i  )+£a,(n  )yi‘in—l  );  i  =1,2,  •  •  •  Aa 

f=i 

(25c) 

x/in)=xin-’j)+'£idiin)xj‘in-l):  j~ 0.1.  •  •  ■  Jib 

/=*  l 

(25d) 

e  (n  )=y  in  )—y  in  ) 

(25e) 

with 

yin  )=£a;  (n  )y(n  —i  )+  £6;  (a  )x  in  —j  )  (25f) 

i=X  j=  o 

It  is  interesting  to  note  that  the  only  difference  between  AFM.  IF  algorithms  and  the 
corresponding  versions  of  Stearns'  algorithm  is  that  in  these  algorithms  y  is  filtered  to 
obtain  y  '  whereas  in  Stearns'  algorithm  y  is  filtered  to  obtain  y  '. 

2.3  A  Convergence  Proof 

The  convergence  proof  presented  in  this  section  is  for  the  IF  algorithm  only.  A 
convergence  proof  for  the  AFM  algorithm  has  not  been  developed  yet.  However,  as 


£ 

■ 
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mentioned  before,  the  AFM  algorithm  is  a  close  approximation  of  the  IF  algorithm  for 
slow  adaptation.  Thus,  the  theory  developed  for  the  IF  algorithm  can  also  be  guidelines 
for  the  AFM  algorithm.  In  the  following,  a  theorem  of  wide-sense  convergence  in 
probability  by  Benveniste  et  al.  [71]  is  presented.  Then  the  original  problem  is 
translated  into  the  study  of  an  associated  ODE.  Finally,  the  study  of  the  ODE  gives  us 
the  desired  convergence. 

2.3.1  A  Theorem  of  Wide-sense  Convergence  in  Probability 

Consider  the  recursive  parameter  estimation  scheme 

e„  +!=©„  +rvn  (e„ ).  e0=b0eDc  (26) 

where  0„  is  the  parameter  vector  estimate  of  dimension  d  at  the  n  th  step.  Here,  the 
capital  0  is  used  to  denote  the  randomness,  whereas  the  lower  case  &  denotes  a 
deterministic  parameter  vector.  V„  (0)  is  a  stationary  random  vector  function  over  Dc 
which  is  in  turn  a  prescribed  compact  region  in  fRd  .  r>0  is  the  constant  gain.  Consider 

also  an  associated  ordinary  differential  equation  (ODE)  with  (26): 

e«»=e0.  (27) 

at 

Here  P[0(r  )]=E  {V„  (0)}  l^,  )  depends  on  t  but  not  on  n  since  Vn  (0)  is  stationary. 
Assume  that  (27)  has  a  solution  9‘  (t ),  t  e/R  ,  which  satisfies 

0*  it  *DC .  te/R.  (28) 

Then  the  following  theorem  is  due  to  Benveniste  et  al.  [71]. 

Theorem: 
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If  the  following  assumptions  are  met: 

i)  Smoothness.  There  exists  a  positive  random  variable  C  x(a>)  such  that  £(C  i)<oo  and 


for  all  9t.62eDc : 


||  (0,.a»)-V„  (02m)  ||  ,( a)  ||  9-02  | 


ii)  Boundedness.  There  exists  C2<°°  such  that 


fuj>  E  ||  V„  (0)  || 2  <C2: 

6*DC 


iii)  Mixing  Condition.  There  exists  a  stationary  random  sequence  {£„  }nt^  defined  over 
the  probability  space  { ft .  3 ,P  }  with  values  in  some  measurable  space  {£  .  }.  which  is 


<£-mixing  in  the  following  sense: 


Z  <f>fn  <  oo 


where 


A  6<x{f, ,  l  ^0} 

4>k=sup  I PU  \B)-P{A  )|:  Be<r{^+i.n>0} 

Furthermore,  there  exists,  for  every  O^m  ^oo,  a  function  V„m(0)  which  is  3£>(DC  )® 
cr{|n_m  4*  +m  l-measurable,  and  a  decreasing  sequence  \um  }m?0  such  that 


where 


Z  vm2<°° 

m  =0 


vm>sup  E  ||  Vn  (0)-V?(O)  ||  2 

6*D. 


Then,  for  S  <oo  fixed,  there  exist  two  positive  and  finite  constants  C  and  C\  and  a 
positive  function  e(r)  going  to  zero  with  r,  such  that  the  estimates  in  (26)  satisfy 

P  sup  ||  9,  — 0‘ (n  r)  ||  >Ce(r)  <C'€(r)  (29) 


'i»-*  W' 
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where  0'  (r  )  is  given  in  (28).  Furthermore,  if  9'  =  Iimo  (i  )  exists,  then 


P  ||  ©  s_1J-d'  ||^7)+Ce(r)  <C'e(r) 


where  [•]  denotes  the  upper  nearest  integer  and  r)  is  such  that  ||  9‘  (S  )—9'  ||  <t). 

Remark:  For  adaptive  HR  filtering  application,  the  stability  is  of  most  importance. 
Hence,  the  theorem  stated  here  is  actually  a  modified  version  of  that  of  [7 1 )  in  that  we 
are  restricted  to  work  in  Dc  instead  of  .  Its  importance  will  be  clear  in  the  next 
section. 

The  expression  of  e(r)  and  other  loosely  related  information  will  not  be  covered 
here.  One  can  refer  to  [7 1  ]  for  further  details. 


2.3.2  Association  of  the  Algorithm  with  the  ODE 


Using  the  unit  delay  operator  q~l,  i.e.,  q -,x in  )=x  in  —  1).  the  following 
polynomial  operators  are  introduced: 


A  ( n  ,  q~i)=\—a  ,(n  )q  _1—  ■  — an-  in  )f‘ 


B  (n  ,  q~l)=b0(n  )+b  tin  )q  _1+  •  in  )q 


A  -1)=  1— a  -1—  -an  q~n* 


B(q  _1)=60+6  +&„V 


If  the  HR  filter  and  the  dynamic  plant  in  Figure  I  are  considered  as  operators,  then 
A  ( n  .  q~ and  Bin  .  ^-1)  are  the  denominator  and  the  numerator  of  the  HR  filter,  and 
A  iq  -1)  and  Biq~l)  are  that  of  the  dynamic  plant.  Note  that  their  orders  na  .  nh ,  n„  . 
and  nb  are  arbitrary. 


.  vv 


I 


25 


Let  x  (n  )  be  the  input  signal,  v  (n  )  be  the  disturbance  (or  measurement  noise). 


a\  in  ) 


a-  ( n  ) 

ft  —  "•» 

b0(n ) 


k(n) 


(n  — !)+-__ — 1 — — v  (n  — 1) 


A(n  .  q  x)  A  {q  l) 


A  (n  .  q  ‘) 


1  B(q~l)  ,  .  v  ,  1  ,  -  . 

T7 - — •  ------x  (n  — na  -  v(n-na  ) 

,0„  )=  ^  (n  .  ?  ‘)  /I  (?  ^  («  •  ?  l) 


Ain  .  q  x) 


x  (n  ) 


^4  (n  ,  q~l) 


—x  (n  -nh  ) 


T  hen  the  II-  algorithm  is  (25)  is  a  close  approximation  (see  [70],  [75])  of 


©r,  +i =9*  +T<t>(n  ,0„  )e  ( n  .©„  ) 


where 


f  "  i\  x  ( n  — 7 — - rr-#(*-?  l)x(n). 

A  (qr~l)  Mn.q "‘) 


It  is  obvious  that  (31)  is  in  the  form  of  (26)  with  Vn  (0„  )=<t>(n  ,0„  )e  (n  ,©„  ). 

Now  define  the  compact  regions  Dc  and  Dc  as  closed  subsets  of  stability  regions. 
At  C  Ds  =!0:  All  zeros  of  A  (:~l)  are  inside  of  the  unit  circle  } 
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Dc  C  Ds  ={0:  All  zeros  of  A  (z  *)  are  inside  of  the  unit  circle  }. 


Note  that  the  dimensions  of  Dc  and  Dc  may  be  different.  Usually  Dc  and  Dc  should  be 


taken  as  close  to  Di  and  Ds  as  possible.  We  now  have 


Theorem  1: 


Let  x(n).  v(n)  be  stationary  processes  with  finite  1st.  2nd  and  4th  moments. 


Assume  that 


a)  A  (q  *)  is  stable,  i.e.,  9eDc  : 


b)  A(n  .  q  l)  is  stable  for  all  n  .  i.e..  0„  eDc  :  and 


c)  |x  (n  ).  v  (n  )}  is  ^-mixing  in  the  following  sense: 


Z<f>t'2<° o 

t  =o 


where 


Ue<r{[x  (n  ),v(n  )]:  n  <0| 

<f>k=sup  \P(U\V)-PUn\:  v«r{U(»+*).v(n+*)]:n>0} 


Then  the  three  conditions  in  Benveniste’s  theorem  are  satisfied,  and  hence  the  proposed 


adaptive  IIR  filtering  algorithm  converges  to  the  solution  of  the  ODE  (27)  and  (28)  in 


probability  as  given  in  (29)  and  (30). 


To  prove  Theorem  1.  we  first  introduce  two  lemmas  which  are  of  interest  in  their 


own  rights.  The  proofs  are  contained  in  the  Appendix. 


Lemma  1: 


Consider  the  following  stochastic  iinear  time-invariant  system: 


Y  in  .  9^=F  (0)}'  (n  -1 . 0)+C  (9)U  (n  ) 
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A  *  * 

where  £(0),  G(9)  are  deterministic  bounded  matrices.  U  in)  and  Y  (n  ,9)  are  stationary 
stochastic  processes  with  £{||(/(n)||}  bounded  and  the  initial  value  Y(0,  0)=}'(O) 
bounded.  Furthermore,  assume  that 

i)  £(0)  has  all  its  eigenvalues  inside  of  the  unit  circle  (which  may  be  translated  into 
constraints  on  9,  e.g.,  9eDc  );  and 

ii)  F(9)  and  G  (9)  are  Lipschitz.  i.e.,  there  exist  0<C  x.  C 2  < 00  such  that3 

||F(8, )-F(S,)|| sc.iie,— e2||  .  .  . 

9l.92eDc . 

|jG(01)-O(02)||^C2||M2|| 

Then.  Y (n  .0)  is  also  Lipschitz  in  0  for  all  n  .  i.e.,  there  exists  a  random  variable  Y>0 
with  E  { Y}  <oo  such  that 

||  Y  (n  .0,)-y  (n  ,02)  ||  <  Y  ||  0,-02  || .  (35) 


Lemma  2: 

Consider  four  linear  time-invariant  HR  filters  having  stable  rational  transfer 
functions.  Let  u,  (n  )  denote  the  i  th  stationary  input  and  y,  (n  )  the  i  th  output. 
Furthermore,  assume  that  u;  (n  )  has  finite  1st.  2nd  and  4th  moments,  i.e.. 

0  ^  |  E  {u,  (n  )}  |  ^  M  i  <  oo 

O^E\ui  (n  )2KM2<oo  for  all  „  (36) 

0^£  {u,  (n  )4}  4<oo 

Then  y,  (n  )  has  finite  1st.  2nd.  3rd  and  4th  moments,  i.e.. 


3  For  the  definition  of  matrix  norms,  see  [78],  Here  we  use  ||A  ||=[rr.  t.4r,4  )]' J=[£  !a,y  I3]1  3  which  is 

l  ... 

compatible  with  1 2  vector  norm. 


•  ~a...  >.♦  «- 


0<|£{y,(n)}  !«£*!<«> 


0^  |£  {y,  (n  )y^  (m  ))  |  ^K2<c° 


0<  |£{y,  (n  )yy  (m  Jy*  (£)}  |  <A'3<oo 


0^  \E  {y,  (.n  )y;  (m  )yi  (£)y;  (0}  I  <AT4<< 


for  all  l^t  ,j  Jc  J.  ^4  and  — oo<n  jn  .£.£<«• 


Proof  of  Theorem  1: 


The  three  conditions  in  Benveniste's  theorem  are  all  in  terms  of  the  deterministic 


time-invariant  parameter  9eDc  instead  of  the  stochastic  time-varying  0„  .  This  makes 


the  analysis  much  easier.  As  shown  before,  the  proposed  algorithm  can  be  written  as  in 


AAA  A 

(26)  with  Vn  (0)=#(n  ,9)e  ( n  .9).  Because  of  time-invariance,  e  (n  ,0)  as  given  in  (32) 


can  be  written  as 


e  (n  .0)= — )~  —  ? — p-*  (n.  —  i  )  — -  l  ■  x  (n  -j  )  0+v  (n  ) 

A  (q  l)A(q  *)  A  (q  ^Af?-1)  A  (q  l) 


1  ^ na  .  0^  j  ^nb 


~  a  (g-*) x  <ft  )+Xcy~*yv 


- — - — —  x  ( n  —i  )+ - - - v  (n  — i  )  - 1— — x  (rt  —  j  )  0 

A(?-1)A(7-1)  A{q~l )  A(?-1) 


where  A  (q~1)=l—a  ,  q~l—  •  ■  •  —  q  nj  with  0=[a  j  •  •  ■  br,bn  ]r  being  constant 


(compare  with  A(n,</_1)  and  A(^~')  defined  earlier).  Define  the  (nj+n^+l)- 


dimensional  vector 


<m  ea  as  hsb 
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>'(«  .0)= 
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iv2(n  —j.Q) 


Biq  _ x  (n  — t  )+ - ! _ v(n  -i 


4  (</  1  ).4  iq  ’) 


.4  iq  1 ) 
x  in  —j  ) 


We  have 


A  iq  ') 

1  ^nu  .  0^  j  ^nh  . 

V„  (0)=0(n  ,0)e  (n  ,9)=Y  in  ,  0)[y  ](n  ,  9)—Y  in  ,  9)r  0]. 

i)  Smoothness  Condition: 

It  is  seen  that  Yin.  0)  is  a  filtered  (by 
Biq'1) 


=d>in  .0), 


(39) 


(40) 


1 


A  iq'1) 


Aiz-1) 

x  in  —i  )+v  in  —i  )  and  x  in  —  j  ).  and  hence  can  be  written  as 


■ )  version  of 


Yin  ,9)=Fi9)Yin-l.9)+GUin  ) 


where,  assuming  =n<, , 


F  (0)= 


*  « 

«1  •  ••  “i. 

1  0 
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x  in) 
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.Note  that  there  is  no  loss  of  generality  in  this  analysis  by  assuming  n7  =n„ .  For 
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nb  <nd  .  Y  (n  .  0)  as  defined  in  (39)  with  nb  =na  is  longer  than  what  we  need.  Obviously 
it  is  sufficient  to  show  that  Y  (n  .  0)  for  the  enlarged  nb  is  Lipschitz.  For  nb>na. 
Y  (n  .  0)  should  be  lengthened.  F  (0)  should  also  be  enlarged  by  appending  to  it  columns 
of  zeros  and  rows  that  are  similar  to  the  last  row  here.  However,  this  only  adds  more 
zero  eigenvalues  to  F(0).  and  changes  nothing  else. 

It  is  clear  from  the  form  of  jF(0)  that  it  has  one  zero  eigenvalue  and  double 

eigenvalues  which  are  the  roots  of  z"M  (z-1)  [56.  Chapter  2].  By  the  assumption  b), 

these  roots  are  all  inside  of  the  unit  circle.  It  is  also  easily  seen  that  F  (0)  is  Lipschitz. 
i.e..  there  exists  O^C  <oo  such  that 

||/'(01)-/’(02)||  ||  0,-02  II-  f°r  all  0,.02€Dc 

because  ||  ir(0,)— F(02)  ||  2=2£(a;(,)— ai(2>)2^2  ||  0!-02  ||  2.  Then  by  Lemma  1.  Y  {n  .0) 

i=l 

is  Lipschitz  for  all  n  .  Now  by  (40).  ||  V„  (0,)— Vn  (02)  |j  can  be  written  as 
||  (0,)-V„  (02)  ||  <  ||  Y  (n  .  0,)y  ,(n  .0,)-r  (n  .  02)y  ,(n  .02)  || 

+  ||  Y  (n  .  0j)y  ( n  .  0,)r  0,-T  (n  .  02)y  (n  ,  02)r  02  || 

=  ||  [10-  •  •Oly(n+1.01)y(n.01/-[lO  •  O]r(n+1.02)rU.02)r  || 

+  ||  Y  (n  .  0,)T  (n  .  0,)r  0,-r  (n  .  0^  (n  .  0t)r  02 

+y  ( n .  0,)y  (n .  0,/  02-y  (n .  02)y  u .  e2)r  02 II 

<  ||y(n+i.0,)y(n.0,)r-y(n+i.02)y(n.02)r  ||  +  ||y(n .0,)y u .0,)r  ||  ||0,-02|| 

+  ||  Y  (n  .  0,)y  (n  .  9i)r  -Y  (n  .  92)Y  (n  .  02Y  ||  ||  0,  || 

<  II  y  (n  + 1 .0,)  ||  ||  y  (n  .  0i)-y  (n  .  02)  ||  +  ||  y  (n  + 1 .0,)-*’  (n  + 1 .0,)  j|  ||  Y  ( n  .  02)  || 


E  ||  Vn  (0)  ||  2=E  [w  x(n  )-w  2(n  )+v  (n  )P  £  [w  3(n  -i  )+v  t(n  -i  )]-+  £  w  4(«  -y  )2 

'=1  2=0 


—E  £  {[w  3(n  )— w  2(n  )p+2[w  j (n  )— w  2(n  )]v  ( n  )+v  (n  )2}  ■ 
i  =1 


[w  3(n  —  i  )2+2w  3(n  — t  )v  j( n  —i  )+v  j (n  —i  )2] 


+  Z  )-w2(n  )p+2[w  j(n  )-w,(n  )]v  ( n  )+v  (*  )2)w4(rt  -j  )2 

j=0 


^  Z  p  i^n  )_vv2^  )PW3(«  —i  P)+2  \E[w  J (n  )— W,(n  )pw3(n  —  t  )v,(n  —i  )| 


+£{[wi(n  )— w  2(n  )Pv  i(n  -i  )2}+2|£[w  i(n  )-w2(n  )]w3(n  -i  )2v  (n  )| 


+4 1 E  [w  j(n  )— w  2(n  )]w  3(n  —  i  )v  ( n  )v  ,(n  —  i  )  | 


+  E  |v  (n  )2w3(n  —  i  )2}+2  |£w  3(n  —i  )v  (n  )2v  j(n  —i  )|+iT[v  (n  )v  t( n  —i  )p 


) 


If  l[w  j(n  )— w 


2(n  )]2w  4(n  —  j  )2}  4-2  |  E  [w  j(n  )— w  2(n  )]w  4(n  —j  )2v  (n  )  | 


+  Ew  4(n  —  j  )2v  (n  )2 


By  the  assumptions  and  Lemma  2,  we  see  immediately  that  all  terms  above  are  finite 
for  all  QeDc  .  Hence  condition  ii)  is  also  satisfied. 

iii)  Mixing  Condition: 

The  sequences  considered  in  Benveniste's  theorem  are  all  two-sided,  whereas  in  a 
physically  realizable  situation  as  ours  all  sequences  start  from  n  =0.  i.e.,  one-sided,  as 
considered  in  Lemma  1.  Obviously  it  makes  no  difference  in  our  proof  for  i)  and  ii). 
This  is  also  the  case  for  part  iii)  because  of  the  equivalence  of  0-mixing  for  one-sided 
and  two-sided  sequences  [79.  p.  169].  For  convenience,  we  will  consider  two-sided 
sequences  in  this  part  of  the  proof  This  also  gives  a  slightly  more  general  taste.  All 
one-sided  sequences  can  be  thought  of  as  two-sided  with  zeros  for  n  <0. 
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The  sequence  {£„  }  here  is  U  (n  ),v  (n  )}  which  is  0-mixing  by  the  assumption. 
Thus  we  only  need  to  show  the  existence  of  the  sequence  {vm  }.  We  first  define  V„m(0). 

Let  /ij(n  .9)  be  the  unit  pulse  response  of  ^  -  —  .  h2(n  .9)  be  the  unit 


pulse  response  of  — - — — 
F  ^  Mz-1) 


.  Then 


y  !(«,§)=  £  h  i(n  '.Q)x  (n  —  ri  )+  £  h  2(n  ‘  .9)v  (n —n  ) 


y2(n,§)=  £  h2(n  ',9)x  (n  —  n  ). 


Define  the  truncated  versions  of  y  i  and  y  2  as 


y  f(n.9)=  £  h  '  ,9)x  in —n' )+  £  h2(n  ’.0)v  (n  —  n' ) 


y2  (n.§)=  Y.  h2(,n‘.9)x(n-n 


Analogous  to  (39).  we  define 


Ym(n.9)= 


yT(n-i  . 9 ) 
y?(n-y  .9) 


1  ^na 
0^  /  ^ nb 


Then  it  is  natural  to  define 


V'„"(e)=l""  (n  .  0)[yf  ( n  .  9)-Ym  (n  .  9)r  9). 


We  are  now  ready  to  calculate 


[E  ||  (0)-V-(0)  ||  2]1/2=  \E  ||r(n.e)y,U.0)-y"(rt.0)yr(n  0) 


-[Y(n.  9)Y  ( n  .  9)r -Ym  (n  .  9)Ym  ( n  ,  9)r  )0  ||  2  1 
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<  2E  ||  Y  (n  .  9)y  t(n  .  9)-Ym  (n  .  9)yT  (n  .  0)  ||  2 

1/2 

+  2 E  |jy(n.0)r(n  .9)r-Ym(n  .9)Ym{n.9)T  ||  2  ||0||2 

r  A  A  ^  a  11/2 

^  'll  Lf  jj  Yin  .  9)y  t (n  .  9)—Ym  (n  .  9)y  f  ( n  .  9)  ||  2 

_  „  |  „  „  „  1/2 
+  >/2  |]  0  ||  p  ||  Y  in  ,9)Y  {n  .9)T  -Ym  in  .9)Ym  in  .9f  ||2 

where  the  inequality  (a  +b  )2^2 (a2+b2)  is  used.  It  is  more  convenient  at  this  point  to 
lump  detailed  calculation  into  a  lemma  whose  proof  is  also  contained  in  the  Appendix. 


Lemma  3: 

With  the  assumptions  of  Theorem  1.  there  exist  0<ai,a2<oo  and  0<p  <1  such 

that 

r  »  1/2 

t)  p  \\Y(n.9)Y(n.9)T-Ym(n.9)Ym(n.9f  ||  2  ^alPm  .  and 

ii)  ||  Y  (n  .  9)y  ,(n  .  §)— Ym  (n  ,  9)y  T  (n  ,  9)  ||  2  ^a2 pm 

for  all  9eDc . 

It  is  then  obvious  that  we  can  let  v^n=\f2  (ajmax  ||  9  ||  +o ->)pm  and 

9fDc 

£  V™  <  CO 
m  =0 

since  0  <  p  <  1 .  This  concludes  our  proof. 

Remarks: 

1)  Notice  that  in  the  proof  many  common  restrictions  are  not  imposed,  e.g..  i)  x(n  ) 
and  v(n  )  do  not  have  to  be  independent:  ii)  v{n  )  may  not  be  white:  iii)  .4  (9-1) 


i 
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and  B(q~x)  (also  A  (q~x)  and  B(q~1))  do  not  have  to  be  coprime:  and  iv)  nothing 
is  said  about  the  orders.  Thus,  this  convergence  is  quite  general,  although  this 
generality  will  be  somehow  reduced  when  we  study  9'  (r  )  (see  Theorem  2  in  the 
next  section). 

2)  0-mixing  implies  uncorrelation  when  separation  is  large  (oo),  e.g.,  a  sequence  of 
independent  variables  is  0-mixing,  an  m-dependent  sequence  is  also  0-mixing  [79], 
0-mixing  also  implies  ergodicity.  see.  e.g..  [80].  Note  that  if  x  (n  )  and  v(n)  are 
independent  of  each  other,  then  (x  ( n  ),v  ( n  )}  being  0-mixing  implies  each  of 
{x  (rt  )}  and  {v  (n  )|  being  0-mixing. 

3)  Since  the  estimate  9„  is  restricted  to  fall  into  Dc  for  all  n  .  a  monitoring  device 
has  to  be  incorporated  into  the  algorithm  ((25)  or  (31))  to  project  ©„  back  into  Dc 
once  it  gets  beyond.  This  might  be  computationally  expensive  and.  in  fact,  is  not 
well  solved  yet  [67],  On  the  other  hand,  as  long  as  (28)  is  satisfied,  r  can  always 
be  selected  such  that  Ce(r)  in  (29)  and  (30)  is  much  less  than  the  least  distance 
from  9  (r  )  to  the  boundary  of  Ds .  Then  the  probability  of  the  filter  becoming 
unstable  can  be  made  arbitrarily  small  in  theory.  This  is  confirmed  by  computer 
simulations  in  Section  2.4. 

4)  The  requirement  (28)  is  essential.  What  happens  if  (28)  is  not  satisfied  can  be  seen 
as  follows.  Suppose  9  (t )  exits  D.  at  a  boundary  point  0.  Then  by  (29).  0„  will 
attempt  to  exit  Dc  at  somewhere  near  0.  However,  the  projection  device  [see  4) 
above]  will  pull  it  back  into  Dc .  Because  of  (29).  0„  will  try  to  exit  again.  An 
infinite  number  of  repetitions  of  this  process  result  in  0„  converging  (in 
probability)  to  the  boundary  point  0.  instead  of  tracing  9‘  (t  )  further,  as  also 
noted  in  [57.  Section  4.3.3].  Satisfaction  of  the  requirement  (28)  has  not  yet  been 
well  studied,  although  such  difficulty  has  not  been  seen  in  our  computer 
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simulations.  However,  it  is  certainly  crucial  for  convergence  and  should  not  be 
omitted  as  in  [70]. 

5)  As  pointed  out  in  [7l].  5  in  (29)  and  (30)  can  never  be  infinity.  This  may  not  be 
very  restrictive  since  in  many  applications  the  algorithm  is  to  be  turned  off  after  a 
finite  number  of  adaptations. 

23.3  Stability  of  the  ODE 

Having  established  (29)  and  (30),  the  next  step  is  naturally  to  study  the  ODE  (27) 
and  its  solution  0  (r ).  A  desirable  property  is  the  parameter  convergence,  i.e..  if 
Iim0  (r  )=0‘  exists,  would  it  be  the  same  as  the  true  parameter  vector 

0=[a  !'•  a,j  b0  ■  ■  ■  b„t  7 eDc  ?  We  consider  "sufficient  order"  case  and  "reduced  order" 
case  separately. 

A)  Sufficient  Order  Case: 

Sufficient  order  means  n '  =min  {na  —na  .  nb  —nb  }  ^0.  In  this  case  D  CDc .  It  can 
then  be  written  0=[a  t  •  a^O  •  0  b0b ,  •  •  b„d0  ■  ■  07 iDc  .  For  n’  =0.  the  global 
stability  of  the  ODE  is  guaranteed  by  the  following  theorem. 

Theorem  2: 

Let  n  =0.  Assume  that  the  stationary  processes  x  ( n  )  and  v  (n  )  are  independent 
of  each  other,  and  A  (r-1)  and  are  coprime.  Let  the  stability  conditions  a)  and 

b)  in  Theorem  1  hold  (substitute  0(r )  for  0„  in  b)). 


w 

A 


i)  If  v  (n  )  is  white  and  x  (n  )  is  persistently  exciting  of  order  m  +1  with  m  =na  +rt6  . 
i.e..  there  exist  0<px.p2<oo  such  that 


0 <Pj/ 


[x  (rt  )  •  •  x  (n  —m  )]<p2/  , 


x  («  —  m  ) 


then  the  ODE  (27)  is  stable  and  9'  ( t )  converges  to  the  true  parameter  value  9. 

ii)  If  v  (rt  )  is  not  white,  then  the  ODE  is  not  guaranteed  to  be  stable  and  the  estimates 
(if  they  converge)  are  generally  biased. 

Proof: 

For  simplicity,  we  drop  the  time-variable  t  in  the  notation  9(t  ). 
i)  Observe  the  identity 

— U  )=  — _?il_2__x  (rt  -i  )  — 1  x  U -j  )  0.  1  ^rta  .0 < nb 
Mq~l)A(q -1)  A  (q~l)A  (q~l)  A(q~l ) 

and  consider  the  RHS  of  the  ODE.  Since  x  (rt  )  and  v  (rt  )  are  mutually  independent,  we 
have  (using  (38)  ) 

£[V„(0)]=£[0(n  .9)e(n  .§)] 


- L{±21 - x(rt-i) 

A  {q  )A  }  B(q~l)  ,  ,  v 

l  ,  . ,  /1(?_1M(?-1) 


- x  (rt  -i )  - L—X  (rt  -j )  0  +E 

A  (q~l)A  (q  ~l)  <4(y_1) 


A{q~') 


v  (rt  — i  ) 


v(rt  ) 


B 

I 
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0  b  ci>  i  •  ■  •  bn 


*$(AJ)=  Obobr--b„t 

1 -a, - anj 

*.  0 

1  -a  , - anj 


...  .  Q 

*o 'MfV 

S  (A  ,B  )=  1  -a  , - a. 


1  “a  i - 


Obobr-bni 
•  •  • 

‘o 

1  -« i - a. 


if  na>na,nh  =nh , 


if  na  —  ^  ^6  » 


if  *„  =n„  *  • 


1  -o  i - <V 


Since  A  ( z  *)  and  B(z  *)  are  coprime,  the  Sylvester  matrix  £)(A  .B)  for  these  three 
cases  are  all  non-singuiar  [56],  [S 1  ].  The  matrix  in  the  middle  of  (44)  is  positive 
definite  for  all  0(r  )tDc  by  persistent  excitation  assumption  and  [57,  Lemma  4.7J.  Thus, 
from  the  structure  of  the  matrix  (44),  we  see  that  it  is  positive  definite  for  all  9(t  )eDc . 

We  now  choose  V  =  y  ||  9— 9  ||  2^()  as  the  Lyapunov  function  of  the  ODK.  The 
derivative  of  V’  is 


^L=(Q-QY  m 

dt  dt 


B(q  1 )  x  ^ 

=-(6-9VE  A^A{q~^  Big-')  {n  _  ■  )  1  x(n-j)  (6-0) 

—1 _ x(n-j)  Mq~')Mq-')  A(q~l) 

A(q~l) 


with  equality  iff  0=0  at  which  point  -^-=0.  Thus  the  assertions  in  i)  are  proved. 


ii)  For  v(n)  non-white,  one  cannot  derive  (43)  from  (42).  Thus,  it  may  not  be 


feasible  to  find  a  Lyapunov  function.  If  we  consider  possible  convergence  points,  i.e.. 


stationary  points  of  the  ODE.  it  is  seen  that  E[Vn  (0)]=O  results  in 


0=0+ 


Big-1)  , 

- L—xU-y)  A(q  l)A  (?  *)  A  {q  J) 

A  (q~l) 


A  {q  *) 


v  (n  —  i  ) 


which  does  not  lead  to  0=0  and  hence  0  (if  exists)  will  be  biased  and  may  even  go 


beyond  the  region  Ds . 


Remarks: 


1)  Theorem  2  together  with  Theorem  1  give  a  rigorous  convergence  proof  for  the  IF 


algorithm  in  the  sense  of  (30)  with  0‘  replaced  by  0. 


2)  The  error  surface  may  not  be  unimodal  even  for  the  sufficient  order  case  [48]. 


Thus.  Theorem  2  proves  global  convergence  regardless  of  local  minima  for  this 


case  (error  surface  case  2)). 


3*  V*  V 


s 


l 

a 


3)  For  n'  >0.  it  was  shown  in  [69]  that  the  stationary  points  of  the  ODE  are  given  by 
A  (z~1)=A  (z~l)L  (z~1)  and  B  (z~l)=B  {z~x)L  (j-1)  where  L(z~l)  is  a 
polynomial  of  degree  n’  with  zeros  inside  the  unit  circle. 


B)  Reduced  Order  Case: 


In  this  case  n*  <0.  If  all  conditions  in  Theorem  2  i)  are  met.  the  stationary  points 
will  satisfy 

E  {0(n  .9)e  ( n  .§)} 


Big-1)  ,  _.s 

-E  A  (q~l)A  (?_1)  ^ 

- !__*( n-j)  A  (q~l)A  (q~l) 

A(q~l ) 


—  Y — rr-x  (n  —i )  — -1  X  ( n  -j  )  9 

A  (q  ^ACq'1)  A  (q  *) 


B(q  J)  (  ^ 

-E  ‘  ..  x  (n— 1) 


<4  (?_1) 


x(n  ) 


(9-1) 


/l  (^_1M  (?_1) 


x (n  — na ) 


<4  (?-1) 


x  ( n  —nb  ) 


v  -  >  . 1  »  "  Pi  r 


I 


I 

I 


ft 


1 
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A  iq-1) 


x  in  ) 


1 


—E 


B  iq  _1) 


Aig'1) 


x  ( n  —i  ) 


x  in  —  nb  ) 


0 


/i  ig  ‘M  (q  ’) 

1  x  (n  —j  ) 


A  iq  J) 
which  implies 


B(g~l) 


Aig  ')A  ig  *) 


x(n—i) 


A(g~') 


x  in  —j ) 


0  =  0  (47) 


i 

B(«")  x (n  — i  ) 

0= 

E 

A  iq  l)A  iq  *) 

Biq  l)  ,  xin-i)  1  xin-j) 

y l 

1  ,  xin-j) 

A  iq~')A  iq  *)  A  iq  *) 

£ 

Aig-1) 

-l 


Aig  l)Aiq  ’) 


Big-1)  ,  - 

— JL_ - _x  in  -na  ) 


A  ig  iq  l) 

— /  _,x  x  (n  ) 

A  ig  x) 


x  in  —nb  ) 


A  ig  ') 


-.x(n-l) 


S(?-) 


A  iq  lM  iq~l) 


A  ig~')A  ig-1) 


x  in  —n. ) 


S 

I! 

'$ 

I 

S: 

n 

Cj 


Aig-') 


in  ) 


1 


A  iq  *) 


x  in  —nb  ) 


0. 


(48) 


Note  that  the  second  matrix  in  (48)  is  not  symmetric  any  more  since  n'  <0.  Thus.  (48) 
is  now  a  complicated  non-linear  equation  in  0,  and  little  can  be  said  about  it  at  present. 
However,  it  should  be  noted  that  the  matrices  in  (48)  still  have  a  special  symmetric 
form  due  to  the  use  of  U  in  )  (see  Proof  of  Theorem  1,  0)  in  <t>  vector  instead  of 

f7(n)=[5(?-1)x(n-l)+v(n-l)  xin  )F  used  in  most  other  algorithms  such  as 
Aig-1) 
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Stearns'  algorithm  and  the  SHARF.  This  is  a  unique  feature  characterizing  the 
underlying  equation  error  approach  [67],  which  may  contribute  to  the  global 
convergence  as  will  be  seen  in  the  next  section. 

2.4  Computer  Simulation 

A  number  of  computer  simulations  were  performed  for  both  AFM  algorithm  and 
IF  algorithm.  For  most  cases,  these  two  algorithms  are  indeed  indistinguishable  from 
casual  observance.  The  only  slightly  noticeable  difference  occurs  when  the  adaptive 
coefficients  migrate  slightly  outside  the  stable  region.  In  this  case,  although  the  adaptive 
algorithm  itself  has  the  ability  to  pull  them  back  into  the  stable  region  if  they  are  not 
too  far  away  from  the  boundary  (see  Figure  7).  the  adaptation  process  becomes  unstable 
so  that  the  adaptive  coefficients  change  more  drastically.  If.  however.  T  is  selected  such 
that  the  adaptive  coefficients  strictly  remain  inside  the  stable  region,  the  two  algorithms 
were  found  to  always  behave  exactly  the  same  for  all  the  computer  simulations 
performed.  Since  it  is  redundant  to  present  simulations  for  both  algorithms,  only  those 
results  using  the  AFM  algorithm  will  be  presented. 

Let  us  consider  first  the  simplest  situation:  error  surface  case  1).  Figure  4  shows 
one  example  of  Stearns'  experiments  [47]  for  this  unimodal  error  surface  case.  The 

transfer  function  of  the  dynamic  plant  is  H.  (;-1)= - ^ T  and  that  of  the 

J  *  p  1—1.2^  -'-F0.6r 

filter  is  H f  ( n  s'1)- - ? - The  input  x  (n  )  is  a  unit  variance  white 

;  l-a,(n  )z~l—a2(n  )~~2 

Gaussian  pseudonoise  sequence.  And  the  disturbance  v(n)  is  set  to  zero.  The  error 
surface  is  normalized  to  one  and  is  shown  by  the  contours.  Starting  from  an  arbitrary 
point  (a  jCO),  a  2(0))  =  (— 0.8.  —0.2).  convergence  is  obtained  for  r«0.001.  It  was 
observed  that  if  r  is  too  large,  the  adaptive  coefficients  will  go  beyond  the  stable  region 
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and  the  filter  will  "blow  up."  For  r  smaller  than  0.001  the  filter  is  always  stable,  and 
slower  convergence  results.  A  plot  of  the  error  progression  is  shown  in  Figure  5  where, 
for  comparison,  the  equation  error  is  computed  by  passing  the  output  error  in  Figure  3 
through  the  adaptive  all-zero  post-filter  as  in  Figure  2.  It  is  seen  that  both  the  output 
error  and  the  equation  error  approach  zero  after  1500  adaptations. 

The  same  experiment  is  performed  for  v(n)  a  zero  mean,  unit  variance  white 
Gaussian  noise  sequence.  A  noisier  version  of  the  adaptive  coefficient  migration  is  seen 
in  Figure  6.  It  is  seen  that  at  the  convergence  point,  the  adaptive  coefficients  wander 
around  the  true  value  with  a  larger  variance  as  opposed  to  Figure  4  due  to  the  non-zero 
disturbance,  and  hence  demonstrating  convergence  in  probability  as  given  in  (30).  For  a 
plot  of  error  progression,  see  the  dashed  lines  in  Figures  14  -  16. 


Figure  6  Same  as  Figure  4  with  non-zero  disturbance 


For  the  error  surface  case  2).  colored  noise  input  is  obtained  by  filtering  a  white 
noise.  To  ensure  the  0-mixing  requirement  and  the  input  richness  requirement,  the 
coloring  filter  used  is  FIR.  The  filtered  sequence  is  an  m-dependent  sequence  and  its 
frequency  response  is  never  zero.  Figure  7  shows  a  multimodal  situation  constructed 


by  Soderstrom  [48]  where  Hp  (z  l)=- - — .  H t  (n  ,z"1)= 

P  (1— 0.7z-1)2  ’ 

b  0(n  ) 

- — — - — - — — — - — and  the  coloring  filter  is  (1— 0.7z  *)2  (1+0.7Z-1)2.  with 

1—  a  t(n  )z  l—a  2(n  )z  2 

v(n)=0.  Two  minima  exist,  one  being  0.9475  (local)  and  the  other  0  (global).  A 
global  convergence  path  is  shown  for  4000  iterations  starting  from  near  the  local 
minimum  point.  Note  that  this  is  also  achievable  by  SHARF  (because  the  filter  orders 
are  sufficient)  and  LMSEE  (because  e  ( n  )=0  at  the  global  minimum  point)  but  not 
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Figure  7  Global  convergence  path  for  a  t(n  )  and  a  2(n  ),  sufficient  order 

with  colored  noise  input 


achievable  by  Stearns'  algorithm. 

The  above  examples  merely  demonstrate  the  proved  convergence  in  the  last 
section.  The  results  are  not  surprising  also  because  they  can  be  achieved  using  some 
other  algorithms  as  mentioned.  However,  the  examples  presented  below  are  unique  and 
interesting,  i.e..  global  convergence  regardless  of  local  minima  for  reduced  order  cases. 


Three  examples  of  case  3)  are  shown  in  Figures  8  -  12.  Figure  8  shows  the 

example  considered  by  Johnson  et  al.  [45]  where  Ha{z~l)= - 0-05— 0.4z - 

F  p  1— 1.1314r  ~l+0.25z~2 

H,  (n.z_1)= - ^ v(n)=0.  and  the  input  is  again  white  noise.  Again  two 

1— a  !(n  )z 


minima  exist  (0.976  and  0.277).  Starting  from  the  local  minimum,  global  convergence  is 
achieved  after  6000  iterations  (r =0.0006).  This  result  is  remarkable  in  that  it  was  not 


Figure  8  Global  convergence  path  for  a  )  and  b0(n  ).  reduced  order 

with  white  noise  input 


achieved  by  any  other  algorithm  [45],  [46],  [62],  The  error  progression  is  shown  in 
Figure  9.  It  can  be  seen  that  minimization  of  the  output  error  does  not  lead  to 
minimization  of  the  equation  error,  and  vise  versa  (compare  the  two  errors  at  about  the 
6000th  iteration  and  at  about  the  3200th  iteration).  Another  example  of  this  case 
(Stearns,  [47])  is  shown  in  Figure  10  where  Hp  (r-1)=l+10r-1, 
,  b0(n ) 

Hf  \n  j:  x)= - - - - - - — v(n)=0,  and  the  input  is  white  noise.  In  this 

;  1  -a1(n)z-l-a2(n)z~2 


case  the  values  at  the  two  minima  are  very  close  (0.656  and  0.498).  The  global 
convergence  is  again  achieved  for  more  iterations  (12000).  The  same  experiment  as  in 
Figure  10  is  then  repeated  for  v  (n  )  a  white  Gaussian  noise  with  zero  mean  and  variance 
50.5  (normalized  to  0.5  [47]  in  these  two  figures).  Again  the  same  kind  of  global 
convergence  is  observed  in  Figures  11  and  12. 


Figure  9  Error  progression  of  Figure  8 
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Figure  12  Error  progression  of  Figure  1 1 
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Figure  13  Global  convergence  path  for  a  ,(n  )  and  a:(n  ).  reduced  order 

with  colored  noise  input 
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Finally,  an  example  of  the  error  surface  case  4)  is  obtained  by  extending  the 

example  in  case  2)  to  H .  (z_1)=- - ^ H f  in  ,z-1)= - — — -  ~  \  - — - — and 

'  (1— 0.6z-1)3  1  1— a  i(n  )z  ~1—a2(n  )z~2 

the  coloring  filter  (1— 0.6z-1)2  (l+0.6z-1)2.  and  is  shown  in  Figure  13.  Note  that  the 

global  minimum  value  is  no  longer  zero  (0.016).  Global  convergence  is  again  observed. 

It  should  be  noticed  from  these  simulations  that  the  convergence  paths  exhibit 
gradient  nature  (of  the  steepest  descent  method)  near  convergence  points.  This  confirms 
the  statements  below  (19). 

2.5  Convergence  Rate  and  Other  Issues 

Two  remaining  issues  concerning  the  convergence  of  the  IF  algorithm  are 
considered  in  this  section:  convergence  rate  and  the  coloring  effects  of  the  disturbance 
v  (n  ).  Again,  the  fact  that  the  AFM  algorithm  is  a  close  approximation  of  the  IF 
algorithm  for  small  r  should  also  be  true  for  these  issues.  Here,  only  sufficient  order 
case  will  be  considered. 


2.5.1  Convergence  Rate 


The  convergence  rate  will  be  studied  for  the  ideal  convergence  case  only,  i.e..  when 
the  conditions  in  Theorem  1  and  Theorem  2,  i)  are  met.  Assume  r  small,  then  9„ 
closely  approximates  0  (r )  in  the  sense  of  (29)  as  proved  in  Section  2.3.  Thus,  to  study 
the  convergence  rate  of  9„  .  it  suffices  to  investigate  the  behavior  of  0'  U  ).  From  (38) 
and  the  identity  in  the  proof  of  Theorem  2.  i)  (page  38). 


e  (n  ,0)= 


Big-1) 

A  (g~l)A  (?_1) 


x  (n  )— 


B(g~l) 

A(q-l)A(q-1) 


x(n—i) 


— -1— — -x  (n  —j  )  0+v  (n  ) 
A  ( q~l ) 


.  .*  -  !■  ■ 


1  ^  i  ^  na  .  0  =$  j  ^  nh 


i  ~  — 7~ — i)  — -1  -  x  (ft  — y  )  (§-fl)  4-  v  (n  ). 

A(q~x) 


Since  {*  (n  )}  and  (v  (n  )}  are  independent. 


E  ( e  (n  ,9)2}  =  E  (v  (n  )2} 


g(;-‘)  ,(n-0 

+(9-0)r£  M  “  M<,  ’  — 11  1  (9-9) 

1  ,  .  ^4  (?-1M  (9"1)  A{q  x) 


where  V  =i-  JJ  9—6  Jj  2,  and  (J2~E  {v  (n  )2}  is  the  variance  of  the  noise  sequence,  see  (45). 

X 

A  A  ^ 

Now  if  0(t  )=0  (r  ).  the  quantity  V  (r  )  then  represents  the  (mean  square)  parameter 
error  and  E{e(n  .0)2}!^_g*(( )  the  (mean  square)  output  error,  and  they  are  related  by 
(49).  It  can  be  shown  that  they  are  both  exponential,  which  was  already  observed  by 
the  computer  simulations  in  Section  2.4.  In  fact,  we  can  write 

-4^  =[0*  (f  HF  R  [0‘  a  )P‘  a  )-0]  (50) 

at 

where  R(6)  is  given  in  (44).  By  the  assumptions  and  the  argument  below  (44). 
R  [6*  ( t  )]  is  positive  definite,  i.e.. 

A-,/  </?[§■  (r  )]< AT27 . 


and  hence 


Tir ,  ||  0*  (r  )-d  II  1|  e*  (r  )-©  II 2. 


s 
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i.e.. 


—IK  XV  (t  )^  dV}l) 
1  dt 

Z-2K2V{t  ). 

(51) 

Solving  (51)  yields  the  bounds  for  V  {t  ): 

Vu{t  )=V{0)e'2Kl‘ 

{Upper  bound  ) 

(52a) 

V,{t  )=V(0)e"2*1' 

{Lower  bound  ). 

(52b) 

Because  of  (49).  the  bounds  for  E  { e  [n  ,0*  ( t  )]2}  are  similarly 

E\e  [rc  .0”  {t  )]2}u  =2K2V{0)e~2K l>  +crv2  {.Upper  bound  )  (53a) 

£  {e  [n  ,0*  (r  )]2}/ =2A'1V(0)e  2Xj,+<r2  {Lower  bound).  (53b) 

The  time  scaling  between  the  ODE  (continuous)  and  the  adaptation  process  (discrete)  is 
t  -n  T. 

The  above  bounds  are  of  little  practical  value,  however,  because  Kx  and  K 2 
depend  on  x  (n  ).  A  ’  {q  -1),  A  {q  -1).  and  B{q~l)  and  are  difficult  to  compute  in  general. 
For  a  given  set  of  data  \x  (n  )}.  iy  (n  )}.  one  can  only  hope  to  approximate  R  [0*  (f  )]  with 
a  constant  matrix  and  to  solve  the  original  ODE  as  a  first  order  linear  differential 
equation.  This  approximation  might  be  very  rough  because  0‘  (r )  may  change 
considerably  during  the  adaptation.  However,  locally  it  can  still  be  accurate  and  worth 
investigating. 

The  simulation  example  presented  in  Figure  6,  Section  2.4  is  used  to  demonstrate 
the  convergence  rate  discussed  above  (for  the  values  used  in  the  example,  see  page  46). 
To  obtain  V  {t ).  a  suitable  constant  approximation  for  R  [0'  {t )]  is  made  and  the  ODE  as 
given  by  (27)  and  (43)  is  solved  for  0‘  {t )  [82]: 
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0'  it  )-©=£>'* leV  )l(,_'o)[0‘  (r 0)— 0].  (54) 

Then  V  it  )=-~-[0  it  )— 0]r[0  it  )— 6],  The  mean  square  output  error  is  calculated  using 

(49)  and  (50)  and  plotted  on  the  time  scale  t  =n  r.  Three  different  approximations  for 
R  [0*  (r )]  are  used  and  their  results  are  shown  in  Figures  14  -  16.  In  Figure  14.  the 
matrix  ^  [0  it  )]  is  calculated  using  A  (9-1)=l— \.2q~l+0.6q~2  in  place  of  A  iq~l )  and 
Aiq~l )  (see  (44)  for  the  expression  of  ^  (0);  5(g-1)=l  in  this  example).  In  other 
words.  A  iq  ~l)  is  approximated  by  its  final  value.  Hence,  the  above  calculation  should 
at  least  be  accurate  locally  around  the  convergence  point,  i.e.  for  t  large.  Three  curves 
are  shown  in  Figure  14:  a  sample  curve  of  the  mean  square  output  error  of  the 
simulated  stochastic  adaptation  process  (using  ©„  ;  dashed  line),  the  ensemble  average  of 
20  such  curves  (solid  line),  and  the  predicted  convergence  curve  for  the  mean  square 


error  calculated  using  the  above  method  (solid  line).  As  expected,  the  predicted  curve 
coincides  with  the  ensemble  average  fairly  well  for  n  large,  poorly  for  n  small  (at  the 
beginning  of  adaptation).  For  the  migration  of  0„  see  Figure  6.  Next,  the  initial  guess 
made  in  Figure  6.  A  (0.^-1)=l+0.8^-1+0.29-2  is  used  in  place  of  A  ( q~l )  while  A  (y-1) 
is  still  the  same  as  before,  i.e..  A  (q -1)  is  approximated  by  its  initial  value.  The  same 
group  of  curves  is  shown  in  Figure  15.  Again,  as  expected,  the  predicted  curve  coincides 
with  the  ensemble  average  very  well  at  the  beginning  of  the  adaptation,  and  poorly  for 
n  large. 

Although  these  two  plots  are  still  not  practical  since  A(<?-1)  is  unknown  in 
reality,  they  do  give  some  insight  into  the  rate  of  convergence  for  the  IF  algorithm.  A 
practical  approximation  for  R  [0*  ( t  )]  can  be  made  by  replacing  the  known  operator 
A  (0^-1)=l+0.89-1+0.2^ ~2  for  both  A(q~l )  and  A(q~l).  The  result  is  shown  in 


Figure  15  Convergence  rate  with  A  ( q~l )  approximated  by  A  (0 .q~l) 


Figure  16,  where  it  is  seen  that  the  predicted  curve  using  this  approximation  is  not 
satisfactory  at  all.  A  more  accurate  approach  would  be  to  solve  the  non-linear  ODE 
without  approximating  /?[0‘  (r  )],  which  is  inconceivable  at  this  moment. 

2.5.2  Coloring  Effect 

Let  the  conditions  in  Theorem  1  and  Theorem  2.  ii)  be  met.  As  pointed  out  in  the 
proof  of  Theorem  2,  ii).  for  a  non-white  disturbance  {v  (n  )}  it  is  difficult  to  analyze  the 
ODE.  and  the  possible  convergence  points  are  biased.  It  is  the  purpose  of  this  subsection 
to  heuristically  discuss  some  necessary  conditions  for  the  bias  to  remain  small. 

From  (46).  it  is  seen  that  the  parameter  bias  is  caused  by  an  additive  vector 
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is  given  in  (44).  In  order  to  keep  ||  A  ||  small,  it  is  necessary  that 
be  small.  Using  (44).  the  following  can  be  written: 
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A  ( q  *)A  ( q  !) 


x  (n  —m  ) 


£>{a  .b  )-‘ 


(56) 


Consider  the  factor  in  the  middle  first.  The  matrix  in  the  bracket  is  at  least  positive 
semi-definite  because  of  its  structure.  In  order  to  be  invertible,  it  has  to  be  positive 
definite  and  hence  the  persistent  excitation  condition  (41)  must  be  imposed  on  the  input 
U  (n  )|  as  argued  before  [57.  Lemma  4.7].  This  is  the  same  requirement  as  for  the  white 
disturbance  case  (Theorem  2.  i)  ). 

Next,  consider  the  first  and  the  third  factor  in  (56).  As  mentioned  before  and 
proved  in  [56]  and  [S 1  ].  «3(A  ,B  )  is  non-singular  if  and  only  if  A  (g_1)  and  B  (9-1)  are 
relatively  prime.  Moreover,  following  [56,  Theorem  A.4.1]  it  can  be  shown  that  for 
||  <S(A  ,B  )_1 1|  to  be  small,  the  zeros  of  A  ( q~l )  should  not  even  be  close  to  the  zeros  of 
B(q~l).  and  vice  versa.  Let  y  be  a  root  of  A  (g-1).  and  y+e  be  a  root  of  B(q~1): 

A  (q  _1)=(  l~y q  _1X  l~a  \q  - a  q  ~nj  +1) 

B(q-l)=(\-yq-l-eq-i)(b'0+b\q~l+  ■  ■  ■  +b  ‘„t  -vq~nb  +1). 

Eliminating  (1— y?-1)  gives 

A  (q~l)(b  0+6  ■  ■  +b  +1)— (7  -1)(  1— a  \q  ~l - -a  ,„j_1?~nj+1) 


=eq  Hi—  a\q 
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Equating  coefficients  of  q~'  on  both  sides  gives 


£(A  ,B  Y  9  =€// 1 


(57) 
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where 

0‘=[-l  a,'  •  •  anj_i  0  b0'  ■  ■  ■  6^-i‘F 

H  =[  0  0  b  'o  b \—a  \b  '0  ■  —a  ’„b  _jF  • 

Taking  the  norm  on  both  sides  of  (57)  gives 

||0'||=€||  &U.B)-T  ||  ||tf'||  ■  (58) 

Now  ||  0'  ||  and  ||/f’||  are  of  moderate  value.  Hence,  if  €  is  small,  i.e..  the  zeros  of 
A  ( q~l )  are  close  to  the  zeros  of  B  ( q~l ),  ||  ^S(/l  ,B  )~T  ||  has  to  be  large,  thus  proving 
the  assertion.  Note  that  this  is  actually  a  sensitivity  criterion,  i.e.,  the  closer  the  zeros 
of  A  (^-1)  are  to  the  zeros  of  B  (^_1).  the  larger  the  bias  if  v  (n  )  is  colored. 

The  two  necessary  conditions  derived,  i.e..  (x  (n  )}  being  persistently  exciting  and 
the  plant  not  having  close  pole-zero  pairs,  are  very  common  in  other  situations.  For 
example,  if  the  plant  has  close  pole-zero  pairs,  it  would  not  be  surprising  to  see 
behavioral  difficulties  in  parameter  convergence  for  any  adaptation  algorithm. 

It  should  be  pointed  out  that  if  the  correlation  of  the  disturbance  is  known  a 
priori,  pre-whitening  filters  can  be  used  to  decorrelate  the  colored  noise  as  proposed  in 
[83], 
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3.  AN  APPLICATION  TO  ECHO  CANCELLATION 


The  reason  for  the  rapid  development  of  the  adaptive  filtering,  as  stated  in 
Chapter  1.  is  that  it  has  a  great  potential  in  wide  range  of  applications  due  to  its  unique 
advantage:  adaptability  to  the  (changing)  environment.  The  areas  of  application  for 
adaptive  filters  are  tremendous  [4]  -  [29],  For  adaptive  FIR  filtering,  the  usual  objective 
for  most  applications  is  the  minimization  of  the  mean  square  output  error  to  obtain  the 
Wiener  solution.  This  proves  to  be  a  reasonable  objective  and  is  practically  useful.  For 
adaptive  HR  filtering,  however,  more  is  desired.  Since  an  HR  structure  can  better  model 
a  real  physical  system,  one  of  the  purposes  for  using  adaptive  HR  filters  is  to  further 
minimize  the  mean  square  output  error.  Moreover,  for  the  sufficient  order  case, 
parameter  identification  is  often  desired,  in  which  case  the  mean  square  output  error  can 
be  reduced  to  zero.  However,  it  can  also  happen  that  one  can  achieve  a  reasonably  small 
mean  square  error  with  a  relatively  large  error  in  parameter  identification,  due  to  the 
complexity  of  the  error  surfaces.  One  major  drawback  of  the  proposed  algorithms,  as 
pointed  out  in  the  last  chapter,  is  the  parameter  bias  in  the  presence  of  a  colored 
disturbance.  If  parameter  identification  is  desired,  this  drawback  may  restrict  the 
algorithms  for  some  applications.  However,  if  only  minimization  of  the  mean  square 
output  error  is  necessary,  the  proposed  algorithms  still  have  a  large  variety  of  practical 
applications.  On  the  other  hand,  there  are  many  situations  where  the  disturbance  can 
be  absent,  or  can  be  modeled  as  a  white  noise  sequence.  In  these  cases  the  proposed 
algorithms  will  achieve  both  minimization  of  the  mean  square  error  and  parameter 
identification  in  the  sufficient  order  case.  For  the  reduced  order  case,  the  proposed 
algorithms  have  the  potential  of  achieving  global  convergence  as  opposed  to  other 
existing  algorithms.  An  important  application  of  this  case,  echo  cancellation  in  long¬ 
distance  telephone  systems,  will  be  studied  in  this  chapter. 
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3.1  The  Principle  of  Echo  Cancellation 

The  problem  of  echo  in  long-distance  telephone  systems  becoming  of  great  concern 
to  engineers  can  be  dated  to  more  than  a  half-century  ago  [84].  Some  classic  methods 
were  developed  which  effectively  eliminated  this  problem  for  systems  having  a  time 
delay  of  less  than  100  ms  at  that  time.  Recently,  due  to  the  development  of  satellite 
communication  technique,  larger  time  delay  and  hence  more  severe  echoes  provoked 
new  blooming  for  research  activities  in  echo  cancellation  [13]  -  [l9].  [4l]  -  [42].  [84]  - 
[85].  The  employment  of  echo  cancelers  has  proved  necessary  in  expanding  telephone 
networks  to  provide  a  truly  world-wide  service. 

The  cause  of  echo  in  a  typical  4- wire  telephone  system  [85]  is  demonstrated  in 
Figure  17.  It  is  a  symmetric  system  for  the  telephone  user  1  and  the  user  2.  Suppose 


Figure  17  A  4-wire  long-distance  telephone  system 
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user  1  is  speaking  and  user  2  is  listening.  The  speech  signal  of  user  1  is  first  transmitted 
from  his  telephone  set  through  a  2-wire  line  to  a  local  office.  In  the  local  office  a  hybrid 
circuit  H  which  acts  as  a  bridge  between  2-wire  (bidirectional)  and  4-wire 
(unidirectional)  lines  connects  this  speech  signal  to  the  upper  path  of  the  4-wire  line. 
The  signal  is  then  transmitted  over  a  long  distance  by  the  4-wire  line  through  the  upper 
path.  The  additive  noise  represents  the  noise  incurred  in  the  2-wire  line  and  at  the 
hybrid  circuit,  which  can  be  well  modeled  as  a  white  noise.  At  the  receiving  end,  the 
hybrid  H  at  a  local  office  near  user  2  sends  the  speech  signal  to  the  2-wire  line  which 
connects  the  telephone  set  of  user  2.  Ideally,  all  the  energy  of  the  incoming  speech 
signal  from  the  upper  path  of  the  4-wire  line  should  be  transmitted  to  the  2-wire  line 
and  to  the  user  2.  This  is  done  to  a  large  extent  in  practice  by  careful  design  of  the 
sophisticated  hybrid  circuit  H.  However,  due  to  the  variation  of  the  number  of 
customers  and  of  the  length  of  the  2-wire  lines.  H  can  never  be  designed  to  match  this 
perfectly.  In  other  words,  the  transfer  function  of  the  path  across  H  along  the  4-wire 
line  denoted  in  Figure  17  as  "echo  path."  is  not  zero.  Thus,  the  speech  signal  of  user  1 
will  leak  out  at  the  receiving  end  and  will  be  transmitted  back  through  the  lower  path 
of  the  4-wire  line,  which  normally  transmits  the  speech  signal  of  user  2.  Since  the 
receiving  principle  for  the  H  at  the  left  is  the  same  as  that  at  the  right,  user  1  will  hear 
the  delayed  version  of  his  own  speech.  This  is  called  talker  echo  [41],  [84]  -  [85].  The 
larger  time  delay  on  the  4-wire  line  (i.e.,  the  farther  the  transmission  distance),  the 
more  severe  and  annoying  the  echo  will  be. 

The  user  2  in  the  above  scenario  may  also  hear  echo,  which  is  called  listener  echo 
[41],  [84]  -  [85].  This  echo  occurs  when  the  talker  echo  signal  traveling  to  the  left  along 
the  lower  path  of  the  4-wire  line  leaks  out  along  the  echo  path  on  the  left  of  the 
transmission  system  due  to  non-zero  transfer  function  at  that  side.  This  echo  signal  is 
superimposed  (time  delayed)  onto  the  main  speech  transmission  stream  on  the  upper 
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path  of  the  4-wire  line,  and  will  be  heard  by  the  listener.  The  listener  echo  is  usually 
less  severe  than  the  talker  echo,  due  to  the  transmission  loss  and  the  loss  along  the  echo 
paths  (the  average  loss  along  the  echo  path  is  6  dB  [85]).  The  listener  echo  can  again 
leak  out  along  the  echo  path  on  the  right  of  the  transmission  system  and  be  transmitted 
back  to  user  1.  which  becomes  2nd  talker  echo.  This  process  can  go  on  to  result  in  2nd 
listener  echo.  3rd  talker  echo.  etc..  However,  due  to  the  losses,  these  high  order  echoes 
are  not  of  much  concern. 

It  is  observed  that  the  transmitted  speech  signal  travels  in  the  4-wire  line  for  only 
one-way.  whereas  echo  signals  travel  for  at  least  a  round-trip.  Thus,  an  early  method 
for  echo  control  is  to  insert  a  3  dB  loss  in  each  transmission  direction  in  the  4-wire  line 
[85].  Although  this  method  suppresses  echo  by  at  least  6  dB.  it  also  introduces  an  extra 
3  dB  loss  to  the  speech  transmission.  A  better  method  using  "echo  suppressors*  were 
then  developed  which  disrupts  transmission  in  one  direction  in  the  4-wire  line  if  speech 
is  transmitted  in  the  other  direction  [42],  [85],  The  echo  suppressors  are  quite  effective 
for  line  delays  less  than  100  ms.  For  larger  delay,  e.g..  in  a  satellite  transmission 
system,  the  clipping  and  the  disruption  become  annoying.  It  is  then  necessary  to  use 
adaptive  echo  cancelers. 

An  adaptive  echo  canceler  is  an  adaptive  filter  placed  in  parallel  to  the  echo  path 
(Figure  18).  Its  task  is  to  simulate  the  transfer  function  of  the  echo  path.  By 
subtracting  the  estimated  returning  echo  y  (n  )  from  the  real  returning  echo  yin  ),  the 
new  returning  echo  ein)  is  suppressed  greatly.  A  figure  of  merit  for  the  effectiveness 
of  an  echo  canceler  is  the  echo  return  loss  enhancement  (ERLE)  which  is  defined  as 
follows  [  1 7]  -  [  1 9]: 

ERLE=  10  log  (59) 
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Usually  20  dB  or  more  ERLE  is  expected  for  effective  echo  cancellation. 

Almost  all  echo  cancelers  today  are  adaptive  FIR  filters.  Although  it  is  simple  to 
implement,  they  cannot  model  the  transfer  functions  of  the  echo  path  well.  In  order  to 
achieve  satisfactory  performance,  the  number  of  weights  in  the  cancelers  must  be  large 
(e.g  .  128  weights  for  satellite  transmission,  [85]),  which  means  high  cost  and  high 
computational  complexity.  This  motivates  the  investigation  of  adaptive  HR  echo 
cancelers. 


3.2  Adaptive  IIR  Echo  Cancelers 


Referring  to  Figure  18,  it  is  assumed  that  user  2  is  quiet.  In  case  of  double-talking 
(when  user  2  tries  to  interrupt),  it  is  assumed  that  there  is  some  mechanism  (speech 
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detector)  to  freeze  the  adaptation  process  [85].  If  the  transfer  function  of  the  echo  path 
is  considered  as  an  unknown  dynamic  plant,  then  the  system  described  in  Figure  18  is 
exactly  the  same  as  that  in  Figure  1.  Since  the  transfer  function  of  the  echo  path  can  be 
well  modeled  as  a  rational  function  [85],  it  is  much  more  suitable  to  use  an  adaptive  IIR 
filter  to  simulate  or  to  identify  it  than  an  adaptive  FIR  filter.  The  proposed  algorithm 
and  the  previous  analysis  can  be  applied  to  the  echo  cancellation  as  depicted  in  Figure 
18  immediately.  A  number  of  computer  simulations  for  various  situations  are 
conducted  and  will  be  presented  in  this  section.  In  order  to  save  computation,  the  AFM 
algorithm  given  by  (24)  is  used  throughout  this  section.  As  mentioned  before,  due  to 
slow  adaptation  the  AFM  algorithm  is  a  close  approximation  of  the  IF  algorithm  whose 
convergence  behavior  is  well  studied  in  the  last  chapter.  Rational  functions  are  used  to 
model  the  echo  path  transfer  function.  The  AFM  algorithm  is  implemented  and 
compared  with  SHARF  and  an  adaptive  FIR  canceier.  The  disturbance  (v(n)}  is 
assumed  to  be  a  white  noise  sequence.  The  results  will  be  presented  for  sufficient  order 
case  and  reduced  order  case  respectively. 

3.2.1  Sufficient  Order  Case 

An  echo  path  transfer  function  usually  exhibits  an  all-pass  characteristic  [85], 
Hence,  the  echo  path  in  this  case  is  modeled  as  a  second  order  all-pass  filter  with  two 
complex  conjugate  poles  located  at  0.95e  ±J  30°  and  two  zeros  at  1 0526e  tJ  30°.  The  gain 
is  selected  as  0.44  such  that  the  loss  for  all  frequencies  is  about  6  dB.  as  mentioned 
before.  Figure  19  shows  the  magnitude  of  the  frequency  response  and  Figure  20  shows 
the  phase.  The  canceier  is  a  second  order  adaptive  IIR  filler  with  five  adaptive 
coefficients  ax(n  ).  a2(n  ).  and  8„(n  )  -  b2(n  ).  Three  kinds  of  input  signals  are  used:  a 
digitized  speech  signal  of  the  sentence  "you  had  a  problem  with  the  blush"  by  a  male 


68 


i3i 

■ 


voice  (Figure  21).  a  digitized  babble  signal  uttered  by  many  speakers  simultaneously 
(Figure  22).  and  a  zero  mean,  unit  variance  stationary  white  Gaussian  noise  sequence. 
The  sampling  frequency  for  the  speech  and  the  babble  is  20  kHz.  Note  that  the  babble 
signal  is  essentially  a  band-limited  noise.  In  order  to  cope  with  the  drastic  magnitude 
change  of  the  speech  signal,  the  AFM  algorithm  is  implemented  with  gain 

normalization,  i.e.,  t  is  replaced  by  — - , -  (see  (24)  ).  This 

£y  ’(n  ~i  )2+  £  x  '( n  -j  )2 
<  =1  j  =o 

regulates  the  convergence  speed  quite  effectively  but  does  not  change  the  convergence 
properties  much,  although  extra  computation  is  required.  Simulation  results  for 
v(n  )=0  will  be  presented  first,  in  which  case  all  adaptive  coefficients  will  start  from 
zero  initial  values. 


Figure  21  Digitized  speech  signal 
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Figure  22  Figure  22  Digitized  babble  signal 


The  first  group  of  plots  shows  the  basic  simulation  results  using  the  adaptive  HR 
echo  canceler  for  the  three  different  kinds  of  input.  For  the  speech  signal  as  the  input. 
Figure  23  shows  the  decrease  of  the  mean  square  returning  echo  E  {e  (n  )2}  as  in  Figure 
18.  and  Figure  24  shows  the  convergence  path  for  the  two  adaptive  coefficients  in  the 
denominator.  The  error  surface  contour  is  not  analyzed  for  this  case  due  to  the 
complexity  of  the  speech  signal.  Only  stability  boundary  is  drawn  in  Figure  24.  From 
the  pole  location  of  the  plant  the  values  of  the  coefficients  of  the  denominator  are  easily 
calculated  as  a  >  =  1.6454  and  a2=— 0.9025.  Convergence  of  the  adaptive  coefficients  to 
this  point  is  achieved  after  50.000  adaptations.  Figure  25  shows  the  returning  echo  for 
the  babbie  input,  and  Figure  26  shows  the  convergence  path  for  the  same  adaptive 
coefficients  as  in  Figure  24.  Figures  27  and  28  are  the  same  but  for  the  white  noise 
input.  The  value  r«0.04  is  used  for  all  these  three  cases.  It  can  be  noticed  that  the 
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number  of  adaptation  decreases  as  the  input  signal  changes  from  the  speech  signal  to  the 
white  noise.  However,  parameter  convergence  is  achieved  for  all  these  three  cases, 
although  Figure  24  exhibits  more  irregularity  for  the  speech  signal  input  case. 
Rigorously  speaking,  speech  signals  are  not  stationary.  Hence,  they  violate  one  of  the 
basic  assumptions  for  the  input  signal  in  the  convergence  proof  (Section  2.3).  In 
practice,  however,  the  generalization  of  white  noise  to  speech  signals  when  one  proceeds 
from  theory  to  practice  is  widely  accepted  [13]  -  [19].  The  above  computer  simulations 
also  show  that  this  generalization  is  not  at  all  unreasonable.  In  other  words,  the 
conditions  on  the  input  signals  for  the  convergence  might  be  relaxed. 

Next,  the  experiment  for  the  same  adaptive  HR  echo  canceler  with  these  three 
inputs  is  repeated  and  compared  with  an  adaptive  FIR  echo  canceler.  The  number  of 
weights  in  the  FIR  canceler  is  32.  They  are  all  set  to  zero  at  the  beginning  of  the 

adaptation.  The  gain  is  also  normalized  to  -  where  r  (step  size)  is  always 

£ x  (n  —  t  )2 

i=0 

chosen  to  be  the  same  as  that  in  the  HR  canceler  (0.04).  The  number  of  multiplications 
at  each  adaptation  is  roughly  32  for  the  FIR  canceler,  and  9  for  the  HR  canceler.  The 
ratio  is  about  3.5.  Two  kinds  of  comparison  between  the  FIR  canceler  and  the  I1R 
canceler  should  be  kept  in  mind  when  viewing  the  simulation  results:  i)  for  the  same 
amount  of  computation,  a  comparison  of  performance:  and  ii)  for  the  same 
performance,  a  comparison  of  computation.  Figures  29  and  30  show  the  results  with 
the  speech  signal  input.  The  mean  square  returning  echo  is  shown  in  Figure  29.  and  the 
ERLE  defined  by  (59)  (note  that  v(n  )=0)  is  shown  in  Figure  30.  Note  that  due  to  tie 
large  number  of  adaptations  required,  the  input  sentence  is  repeated  for  as  many  times 
as  needed.  It  is  seen  from  Figure  30  that  the  FIR  canceler  achieves  a  steady  state  ERLE 
(20  -  50  dB)  after  37.500  adaptations  whereas  the  HR  canceler  needs  more  adaptation 


Figure  29  Mean  square  returning  echoes  for  speech  input,  compared  with 

an  FIR  canceler 
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and  can  achieve  higher  ERLE.  For  the  same  amount  of  computation  as  for  the  FIR 
canceler  with  37.500  adaptations,  the  IIR  canceler  can  have  about  3.5x3 7. 500=  130k 
adaptations  where  its  ERLE  is  75  -  110  dB.  On  the  other  hand,  for  the  IIR  canceler  to 
achieve  20  -  50  dB  ERLE.  it  only  needs  40k  adaptations,  which  is  much  less  than  130k. 
Thus,  the  IIR  canceler  performs  much  better  than  the  FIR  canceler  in  these  senses.  The 
potential  for  the  IIR  canceler  is  even  more  exciting:  at  150k  adaptations  in  Figure  30. 
the  ERLE  reaches  the  average  of  100  dB  and  is  still  increasing.  Figures  31  and  32  show 
the  same  comparison  for  the  babble  input,  and  Figures  33  and  34  are  for  the  white  noise 
input.  The  IIR  canceler  outperformed  the  FIR  canceler  in  the  above  sense  for  all  these 
three  inputs.  However,  the  price  is  slower  convergence.  It  is  especially  clear  from 
Figures  33  and  34  for  the  white  noise  input.  In  Figure  34,  the  FIR  canceler  quickly 
converges  to  its  steady  state  value  22  dB  after  2500  adaptations,  whereas  for  the  IIR 
canceler  it  takes  6250  adaptations,  although  the  amount  of  computation  at  this  point 
for  the  IIR  canceler  is  only  about  70  percent  of  that  of  the  FIR  canceler  at  2500 
adaptations. 

The  potential  that  the  IIR  canceler  can  achieve  high  ERLE  is  further  investigated 
for  the  white  noise  input.  Figure  35  shows  an  overall  view  of  Figure  34,  where  it  is 
seen  that  after  40k  adaptations  the  ERLE  of  the  IIR  canceler  settles  at  110  dB  level.  It 
is  apparent  that  the  IIR  canceler  outperforms  the  FIR  canceler  tremendously  in  this 
sense. 

The  effect  of  t  in  the  convergence  of  the  ERLE  for  the  IIR  canceler  is  shown  in 
Figure  36  for  r=0.02,  0  04,  and  0.08,  respectively.  The  curve  in  the  middle  (r=0.04) 
corresponds  to  that  in  Figure  35.  It  is  seen  that  the  slope  of  the  curve  for  r=0.08  is 
twice  as  much  as  the  slope  for  the  curve  r=0.04,  which  is  in  turn  twice  as  much  as  that 
for  r=0.02.  This  is  simply  because  t  =n  r.  In  other  words,  the  convergence  speed  is 
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Figure  31  Mean  square  returning  echoes  for  babble  input,  compared  with 

an  FIR  canceler 
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Figure  32  HRI.F's  for  babble  input,  compared  with  an  FIR  canceler 
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Figure  36  KRl.E's  for  different  values  of  t.  white  noise  input 


proportional  to  r.  The  variance  of  the  ERLE  at  the  convergence  point,  as  can  be  seen 
from  the  amount  of  variation  for  the  leveled  off  part  of  the  three  curves  on  the  top  of 
Figure  36,  also  increases  as  r  does,  as  expected.  What  is  surprising  in  this  figure  is  that 
the  converged  mean  value  of  the  ERLE  also  depends  on  t,  and  increases  slightly  as  r 
does.  At  this  point  its  cause  is  not  clear  yet. 

The  AFM  algorithm  is  also  compared  with  SHARF  algorithm  (7)  for  IIR  cancelers. 
In  implementing  SHARF  algorithm,  the  error  smoothing  filter  is  selected  as  Cp- 0.8 
and  c2=0  so  that  the  SPR  requirement  in  (8)  is  satisfied.  In  order  to  achieve  comparable 
results,  the  adaptive  gain  in  SHARF  algorithm  is  also  normalized  to 

J 

-* - * -  with  t=0.04  which  is  the  same  as  that  used  in  the  AFM 

Zy(n-i)2+Zx(n-;)2 
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algorithm.  With  the  speech  signal  as  the  input,  the  results  are  shown  in  Figures  37  and 
38.  Differing  from  the  AFM  algorithm,  SHARF  starts  to  show  significant  returning 
echo  reduction  fairly  late,  but  then  it  reaches  the  steady  state  very  quickly.  This  is 
because  SHARF  uses  a  different  convergence  strategy  than  AFM.  The  convergence  path 
of  the  adaptive  coefficients  for  SHARF  is  very  different  from  that  of  AFM.  Mote  in 
Figure  38  that  the  steady  state  ERLE  levels  for  the  two  IIR  canceler  algorithms  are 
almost  the  same. 

All  the  above  experiments  are  conducted  assuming  v(n  )=0.  In  reality.  {v(n  )}  is 
generally  non-zero,  and  can  be  modeled  as  a  white  noise  sequence.  An  experiment 
similar  to  that  shown  in  Figures  29  and  30  is  performed  with  !v  (n  )}  being  a  zero  mean 
white  Gaussian  noise  sequence  with  variance  0.0001.  The  same  IIR  canceler  and  FIR 
canceler  are  used.  Due  to  the  disturbance  v  (n  ).  the  (normalized)  adaptive  gain  must  be 
much  smaller  than  before  (r=0.0016)  to  achieve  satisfactory  convergence  and  to  ensure 
the  stability  of  the  IIR  canceler.  In  order  to  avoid  an  excessive  number  of  adaptations 
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Figure  38  ERI.F.'s  for  speech  input,  compared  with  SHARP 
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for  such  a  small  adaptive  gain,  the  HR  canceler  is  initialized  near  the  convergence  point. 
It  is  chosen  that  a!(0)=1.6.  a2(0)=— 0.8,  b0( 0)=0.5,  b  1(0)=—0.8,  and  b2(0)= 0.5.  The 
FIR  canceler  is  initialized  with  the  values  of  the  first  32  samples  of  the  impulse  response 
of  the  initial  IIR  canceler,  B  (0.z~l)/A  (0,z-i),  which  is 

h  (n  )=0.625S(n  )— 2  Re{(0.0625+ j  0.125 )(0.8+ j  0.4)"  };  n=0.1,  •  ■  ■  .31.  (60) 

This  initialization  is  also  reasonable  in  a  practical  sense.  Usually  an  echo  canceler  is 
used  for  a  long  time  after  installation;  hence,  every  time  it  is  turned  on  it  always  starts 
adaptation  from  some  nominal  value  near  the  optimum  rather  than  from  zero  as  it  is 
first  installed. 


The  results  of  this  experiment  are  shown  in  Figures  39  and  40.  Note  that  for  the 
definition  of  ERLE  as  in  (59).  when  the  signal  part  in  y  (n  )  becomes  small,  as  a  speech 
signal  often  does,  both  y  (n  )  and  e  (n  )  are  dominated  by  v  (n  )  and  the  ERLE  becomes 
zero.  Thus,  the  ERLE  curve  constantly  oscillates  between  zero  and  its  functional  value. 
Our  experience  indicates  that  the  situation  can  be  so  bad  that  one  cannot  obtain  any 
information  from  the  ERLE  curve  thus  defined,  and  hence,  it  essentially  becomes 
useless.  A  modified  definition  used  in  [18]  can  overcome  this  difficulty: 


ERLE  =  101og 


E  {[y  (n  )— v  ( n  )]2} 
E{[e(n  )—v(n  )]2} 


(61) 


For  v(n)=0.  (61)  coincides  with  (59).  Note  that  this  definition  is  not  practical  since 
v  (n  )  is  not  measurable.  However,  it  does  give  some  indication  about  how  well  an  echo 
canceler  is  functioning  in  the  presence  of  v(n),  especially  for  real  speech  input.  The 
ERLE  curves  in  Figure  40  are  the  results  of  the  experiment  using  definition  (61).  It  can 
be  seen  from  these  two  figures  that  the  IIR  canceler  still  outperforms  the  FIR  canceler 


while  both  are  degraded.  However,  the  improvement  of  the  IIR  canceler  over  the  FIR  is 


much  less  than  the  noise-free  case,  because  the  noise  causes  the  adaptive  coefficients  to 


wander  around  the  true  value  with  a  much  larger  variance. 


3.2.2  Reduced  Order  Case 

To  construct  a  reduced  order  situation,  the  echo  path  transfer  function  of  the  last 
subsection  (sufficient  order  case)  is  modified  by  adding  a  pole  at  0.7  and  a  zero  at  1.4286 
so  that  it  still  has  an  all-pass  characteristic  (see  Figures  41  and  42).  The  gain  is  0.3  so 
that  the  loss  is  about  6.5  dB.  The  same  adaptive  HR  canceler  and  FIR  canceler  are  used 
with  the  three  inputs  for  v(n)=0.  The  constant  r  in  the  normalized  gain  of  the 
cancelers  is  adjusted  to  have  the  value  0.01.  All  the  adaptive  coefficients  are  initialized 
with  zero  value. 

Figures  43  and  44  show  a  similar  comparison  between  the  HR  canceler  and  the  FIR 
canceler  as  before.  It  may  seem  surprising  that  the  HR  canceler  not  only  performs 
poorer  than  the  FIR.  it  also  shows  nearly  no  reduction  in  the  mean  square  returning 
echo  (Figure  43)  and  no  increase  in  ERLE  (Figure  44).  This  is  probably  because  of  i) 
possible  shallow  holes  of  the  error  surface  as  a  (usually  complicated)  function  of  the 
five  adaptive  coefficients,  and  of  ii)  the  non-stationarity  of  the  speech  signal.  It  is 
obvious  that  the  global  minimum  value  of  the  error  surface  for  the  HR  canceler 
decreases  as  the  number  of  the  adaptive  coefficients  increases.  It  reaches  zero  when  a 
sufficient  order  situation  is  obtained.  Similarly,  the  only  minimum  value  of  the 
quadratic  error  surface  for  the  FIR  canceler  also  decreases  as  the  number  of  weights 
increases.  However,  it  will  never  reach  zero  if  the  unknown  dynamic  plant  has  a 
rational  transfer  function.  What  happens  here  is  probably  that  the  global  minimum  of 
the  error  surface  for  the  adaptive  IIR  canceler  is  larger  than  the  minimum  of  the 
quadratic  error  surface  for  the  adaptive  FIR  canceler.  It  should  not  be  surprising  to  see 
further  deduction  in  the  mean  square  returning  echo  (eventually  to  zero)  using  the  HR 
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canceler  if  the  number  of  adaptive  coefficients  increases. 

An  overall  view  of  the  above  situation  for  the  HR  canceler  can  be  seen  in  Figures 
45  and  46  for  the  returning  echo  and  the  convergence  path  of  the  coefficients  in  the 
denominator.  The  adaptive  coefficients  do  converge  to  some  point  as  seen  in  Figure  46. 
It  is  interesting  to  see  the  repeating  pattern  of  the  convergence  path  as  the  input 
sentence  is  repeated.  Even  at  the  convergence  point,  the  adaptive  coefficients  circle  along 
a  "P"  shape,  resulting  in  the  large  error  at  a  certain  portion  of  the  input  signal  (Figure 
45).  This  is  due  to  the  peculiarity  (e.g..  non-stationarity)  of  the  speech  signal,  which  is 
not  obviously  shown  in  the  sufficient  order  case  before.  "Better"  input  signals:  the 
babble  signal  and  the  white  noise  sequence  are  used  for  the  same  experiment,  and  the 
results  are  shown  in  Figures  47  -  50.  For  the  babble  input  (Figures  47  and  48),  decrease 
of  the  mean  square  returning  echo  for  some  portion  of  the  input  signal  is  obvious  while 
increase  for  some  other  portion  still  exists.  The  convergence  of  the  adaptive  parameters 
is  seen  better  than  that  with  the  speech  signal  input  in  the  sense  that  the  variance  is 
much  smaller.  White  noise  is  perhaps  the  most  ideal  input  (Figures  49  and  50). 
Decrease  of  the  mean  square  returning  echo  and  convergence  to  approximately  the  same 
point  is  observed.  It  should  be  noted  that  the  mean  square  returning  echo  reaches  a 
steady  value  at  only  10k  adaptations  (two  percent  of  the  total  number  of  adaptations), 
whereas  it  took  almost  500k  adaptations  to  achieve  parameter  convergence. 

The  above  HR  canceler  (using  AFM  algorithm)  is  also  compared  with  the  one  using 
Stearns'  algorithm  (6)  for  this  reduced  order  case.  The  gain  in  (6)  is  also  normalized  to 

— - - -  with  T-0.01.  From  the  results  in  Figures  51  and  52  it  is 

ZyU-iy+ZxU-j? 

1=1  J =0 

seen  that  the  AFM  algorithm  performs  slightly  better  than  Stearns'  algorithm.  Since 
Stearns'  algorithm  is  a  gradient  algorithm,  this  ve  .fies  that  the  AFM  algorithm  has  not 
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Figure  51  Mean  square  returning  echoes  for  speech  input,  compared  with 
Stearns'  algorithm,  reduced  order 
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reduced  order 


missed  other  local  minimum  points  that  are  lower  than  the  one  that  AFM  is  aiming  at. 

Finally,  the  AFM  canceler  is  compared  with  the  SHARF  canceler  used  in  the  last 
subsection  with  r  changed  to  0.01.  Note  that  the  SPR  requ..ement  (8)  is  no  longer 
satisfied  with  c  ,=-0.8  and  zero  otherwise.  However,  it  can  be  checked  that  the  SPR 
requirement  is  violated  for  only  a  small  amount  on  a  small  portion  of  the  unit  circle. 
Based  on  the  argument  that  the  SPR  requirement  might  be  overly  restrictive  [59],  the 
SHARF  canceler  is  implemented  the  same  as  before  without  any  other  modification. 
Figures  53  and  54  show  the  results,  from  which  SHARF  is  seen  to  perform  much  worse 
than  the  AFM  algorithm.  This  confirms  the  observation  of  SHARF's  misbehavior  in  the 
reduced  order  case  [62],  [64]. 

From  the  above  rich  collection  of  examples  it  is  clearly  seen  that,  in  general, 
adaptive  HR  cancelers  may  not  perform  as  well  as  adaptive  FIR  cancelers  in  reduced 
order  case.  However,  if  one  selects  the  order  correctly,  e.g.,  if  a  3rd  order  adaptive 
canceler  is  used  in  this  subsection,  it  is  obvious  that  the  same  results  as  those  in  the  last 
subsection  for  sufficient  order  case  may  be  expected.  In  other  words,  the  3rd  order  HR 
canceler  using  the  AFM  algorithm  would  outperform  the  FIR  canceler  having  32  (or 
more)  taps,  although  more  adaptations  may  be  needed.  No  more  such  examples  shall  be 
included  here. 


SHARP 


4.  CONCLUSION 


A  family  of  new  adaptive  HR  filtering  algorithms.  SIM  algorithm.  AFM  algorithm, 
and  IF  algorithm,  is  proposed  based  on  the  Steiglitz-McBride  identification  scheme.  It  is 
proved  using  a  theorem  of  wide-sense  convergence  in  probability  that  the  IF  algorithm 
converges  to  the  solution  of  an  associated  ODE  under  fairly  general  conditions.  It  is 
further  proved,  imposing  more  conditions,  that  the  filter  coefficients  converge  to  the  true 
system  parameter  values  in  sufficient  order  case.  The  AFM  algorithm,  which  is  shown 
to  be  a  close  approximation  of  the  IF  algorithm,  is  implemented  to  verify  the  above 
theoretical  results.  For  reduced  order  case,  a  number  of  computer  simulations  using  the 
AFM  algorithm  exhibits  global  convergence  regardless  of  local  minima.  The  proposed 
algorithms  are  also  applied  to  the  problem  of  echo  cancellation.  A  large  variety  of 
simulations  using  various  input  signals  is  performed  and  compared  with  adaptive  FIR 
echo  cancelers  and  other  existing  adaptive  IIR  algorithms.  The  results  are  favorable. 

It  is  shown  in  Section  2.2  that  the  family  of  algorithms  is  derived  from  traditional 
equation  error  approach.  With  pre-filtering,  the  inherent  flaw  of  the  equation  error 
method,  parameter  bias  in  the  presence  of  a  disturbance,  is  removed  to  the  extent  that 
only  a  colored  disturbance  would  cause  parameter  bias.  The  effect  of  this  bias  in 
relation  to  the  input  and  the  unknown  plant  is  studied.  For  a  white  disturbance,  the 
algorithm  is  shown  to  converge  exponentially. 

It  should  be  pointed  out  that  the  problem  of  the  adaptive  ITR  filtering  in  the 
reduced  order  case  is  quite  difficult.  Some  researchers  approached  this  problem  by 
finding  error  bounds  so  that  the  filter  can  be  guaranteed  at  least  net  to  "blow  up"  [86]. 
[87],  The  global  convergence  of  the  proposed  algorithms  in  Section  2.4  is  unique  and 
interesting.  Although  this  observation  only  gives  a  limited  view,  it  deserves  the 
following  conjecture.  For  reduced  order  cases,  if  all  conditions  in  Theorem  1  and 


Theorem  2  of  Section  2.3  are  met.  then  the  IF  algorithm  converges  to  the  best  fit,  i.e..  the 
global  minimum  point  of  the  mean  square  error  surface,  if  such  a  minimum  exists.  In 
practice,  however,  the  effectiveness  of  using  adaptive  HR  filters  as  opposed  to  adaptive 
FIR  filters  depends  largely  on  the  correct  choice  of  the  filter  orders.  Even  with  the 
global  convergence,  an  adaptive  HR  filter  may  still  be  inferior  to  an  adaptive  FIR  filter 
in  the  reduced  order  case,  as  seen  from  the  application  examples  in  Section  3.2.2. 

One  major  purpose  of  having  constant  gain  for  adaptive  filtering  is  to  track  time- 
varying  systems.  However,  due  to  the  complexity  of  time-varying  situations,  not  much 
analysis  has  been  done  [72],  [88],  More  powerful  convergence  theorems  (e.g.,  [74])  may 
be  needed,  and  stronger  conditions  may  have  to  be  imposed  (e.g..  [89]  to  allow  S  -*oo  in 
(29)  and  (30)).  Having  dealt  with  only  the  stationary  situation  and  a  specific  area  of 
application,  this  dissertation  can  be  a  necessary  pre-study  for  aforementioned  more 
complicated  cases,  and  be  a  demonstration  of  potentially  wider  application  of  recent 
weak  convergence  results  [7l]  -  [74]  to  various  adaptive  filtering  algorithms  [49]  -  [51]. 
[59]  -  [67].  The  proposed  algorithm  might  be  further  modified  to  estimate  the 
disturbance  so  that  the  bias  in  the  presence  of  colored  disturbances  can  be  removed. 
The  problem  of  stability  monitoring  needs  to  be  studied  further.  Other  areas  of 
application  are  also  possible.  In  short,  the  adaptive  IIR  filtering  is  a  young  and 
prosperous  research  area.  The  proposition  of  the  family  of  new  algorithms  only  opens 
an  additional  window  which  will  hopefully  bring  in  some  fresher  air. 
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PROOF  OF  THE  LEMMAS 

A )  Proof  of  Lemma  1 : 

From  (34),  it  follows  that 

Y (n  ,  ei)=JP(d1)n  Y (0 )+"£/■  (0,)' G  (SiX/ (n -i ) 

i  =o 

Y  (n  .  92)=F  (92r  Y  (0)+"£ F(92)‘G  (02)i/  (n  -i  ) 

i  =0 

Thus. 

||  Yin.  90-Y  (n  .  02)  ||  =  ||  (0^  -F  (02y  ]Y  (0) 

+  zV  (0,)'  G  (0j)~.F  (02)‘  G  (02)]f/  (n  -i )  || 

i  =0 

^IlF^^-fC^y  ||  ||r(o)  || 

+  I1H/’(0i)'G(0i)-F(02)'G(02)||  \\U(n-i)\l  (Al) 

I  =0 

We  now  estimate  ||  FC0,)"  -F{92Y  ||  and  ||  F(0,)' G  (0,)— F(02)‘C  (02)  ||  • 

I  =0 

||(/(n— i)||  respectively. 

Using  the  identity  A  ]  —A  \  =A  i  —A  ]  -1A  2+A  j  ~‘A  2— .4  ]  ~2A  ?  +  ■  •  +A  XA  2  -1 
—A  5  .  the  first  estimate  is 

||F(0,r-f(02)n  ||  =  ||  lV(01)n-1-^[/-(01)-F(02)]F(02V  || 

o 


<  £  limy-5-;  ||  ||jf(01)-jf(02)||  ||F(@2y  u 

j=0 


110,-02 1|  I  Wfwj'-'-*  ||  II  Fiky  II . 

;=0 


Let  A,  (§)  denote  the  i  th  eigenvalue  of  F  (0).  By  the  assumption  i),  there  exists  0< A  <  1 
such  that  I  A*  (0)|  <A  for  all  i  and  all  9tDc .  Hence,  by  [90,  Theorem  5],  there  exists 
C3> 0  such  that  ||  F(9y  ||  <C3A'  .4  So  (A2)  can  be  written  as 

||  FiW-FiOiT  ||  ^CjC32  ||  0i~ 02  ||”Z  A"  ~1=C  xCf  ||  9l—92  II n  An_1-  (A3) 

j=0 

It  is  easily  seen  by  comparing  two  adjacent  terms  that  as  n  increases,  the  sequence 

n  \n  -1  reaches  its  maximum  value  1)  at  N  =[-A_]  where  again  [•]  denotes  the  upper 

1  A 

nearest  integer,  and  then  decreases  monotonically  to  zero  as  n  -*oo.  Hence.  (A3)  is 


bounded  by 


||  FWtT  -F{92T  ||  jC  3N  A*v“l  ||  0,-02  || 


Next,  estimate 


£  ||F(ei),G(0,)-F(fl2),G(e2)||  || U(n—i 

i  *0 


=  £  ||  F (0,)‘  G  (0,)-F (§,)'  G  (02)+/r  (flj)' G  (92)-F  (92)‘  G  (02)  ||  ||  U  (n  -i )  || 


£||m)‘  I!  ||G(0,)-G(02)||  ||  LHn-i)  II 


4  Although  in  [90,  Theorem  5]  the  conclusion  is  ||  A  1  ||  2^C  jV  wher-  ||  A  ||  2=[p(ArA  it  is  readi¬ 
ly  seen  by  the  equivalence  of  norms  [78]  that  ||  A  >  [|  $C'  ,  ||  A  ‘  ||  2^C'  jC  jAy. 


I 

( 

I 

+"z  |!/r(0i),-^(02)'  ||  ||C(02)||  II  U  in  -i  )  ||  j 

i  =0  I 


C  ,C  3  II  M2  II  Z'x'  II  u  (n  -i  )  II  +C  ,C  32  II C  (0,)  ||  ||  0.-0,  ||  Z  i  X'  -  ||  U  (n  -«  )  || 

i=0  I =0 


where  the  last  step  is  obtained  using  (A3).  Now  the  total  estimate  is 


IrCn.O^-yCn.^) 


n  — 1 


\C  .C  Xv  -1  II  Y  (0)  ||  +C  2C  3  Z  X1  II  U(n-i)  || 

i  =o 


n  —1 


+  C.C  32  ||G(02)||  ^tX'^Hi/Cn-OII 

i  =o 


II M2 II 


=Y  ||  0.-02  || 


It  is  obvious  that  Y>0.  Its  mean  is 

E  { Y}  =C  .C  j ?N  X  v  -‘  ||  y  (0)  ||  +C  2C  I  ||  (/  (n  -»  )  ||  f £ X' 

i=0 


—l 


+C  ,C  j3  ||  G  (02)  ||  E  { ||  U  (n  -t )  || }  Z  i  X' 


I  =0 


Obviously  £X'=  — — — r-<°°-  Now  since  iX,_1=— —  X'  and  n  is  finite,  it 
,=o  1— X  1— X  a  X 


follows  that 


n  —1 


n  —l 


Z*x‘~1=Tx  2Lx'  = 

,  =o  "X  i<o  “  X 


d  1 — X” 


1 


1— X  (l— X)2 


nX’-'O-O+X"  <  1 

(1-X)2  "(l-X)2 


<oo 


Thus  by  the  assumptions  it  can  be  concluded  that  E  (Y)  <oo. 


B)  Proof  of  Lemma  2: 


Let  h  (n).  l^i  ^4.  —  oo<n  <eo  denote  the  unit  pulse  response  of  the  i  th  filter 
(for  causal  filters.  O^n  ^oo,  which  falls  into  the  more  general  case  considered  here). 
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Then 


y,  (n  )=  £  (*  ~ n‘  )u,  (n  '). 


i) 


E  |y((n  )|=£  |  ^  (n  —  n  )ui  (n  ‘)l  ^  £  l*i  («  _ n  )\E  |u,  (n  ‘)l 


<A/,  £  |/ij  (n1  )|  \K  =K  i<oo. 


The  last  step  is  obtained  since  all  poles  inside  the  unit  circle  implies  that 
£  \h,  Cn'  ) |  < K  <oo  for  all  1  ^4  [9 1  ]. 


ii) 


y,  (n  )yj  (m  )=  £  /i,  (n  —n  )u,  (n  ')  £  hj(m-m'  )ujCm') 


Iffy,  (n}yy(m)|  |  =  |  £  hjCn—n')  £  /i;  Cm  —  m'  )E  U;  (n  ’)uy  (m  ')!  I . 


By  Cauchy-Schwarz  inequality  \E\XY  )  |  ^(£ \X2)E  [Y2])1'2  [92].  we  have 


| E{ui(n‘)uj  Cm  ')}  I  ^  (u,  (n  ')2}E  lu,  (m ’)2} 


1/2 


< M - 


Hence 


\E  {y,  Cn  )y;  Cm  )}  |  ^M2\  £  h,(n')ll  £  h;  Cm’  )|  ^M2K2=K2<00 


iii)  Analogous  to  i)  and  ii).  we  have 
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|£{u,  (n  In,  (m (£')}  I 


<A3  [flu,  («  *)2u,  (m  ')2|£  {wt  (f)2|  j^A^,1'2  [f{U;  (n  (m  02} 
$K3M2V2  jf  {u,  (n  ')4}£iu,  (m  O'4)  A3(A/2A/4)1/2=A3<°o 


iv)  Same  as  before: 


l£{yi  (n  )y,  (m  )y*  (£)y,  (£)}]<  I  X  M'l')ll  Z  7i;(m')||  Z 

n  oo  nt  ’=—00  — 00 
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Cl  Proof  of  Lemma  3: 

The  proof  follows  Benveniste  et  al.  [71]  closely. 

For  simplicity,  we  drop  0  in  notations.  Using  the  same  definition  for  matrix  norm 
as  before,  we  have: 
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where  the  inequality  (a  +b  )2^2(a  2+b~)  is  again  used.  Now  again  using  Cauchv- 
Schwarz  inequality  as  in  the  proof  of  Lemma  2.  we  have: 
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Now  from  Lemma  2  we  know  that  £y2(n  —j)*<KA<oo  and 
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For  the  stable  filters  considered  here,  their  unit  pulse  responses  always  have  an 
exponential  form: 
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for  all  n  ,  i .  and  for  all  QeDc .  By  the  same  reasoning,  we  have 
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Thus,  we  finally  obtain  the  estimate 
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Then  by  the  same  reasoning  as  in  i).  we  immediately  have  the  desired  result. 
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